You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Yu Qian <yu...@gmail.com> on 2020/08/04 10:08:55 UTC

Re: [AIP-34] Rewrite SubDagOperator

Hi Jarek,

I agree we should not change the behaviour of the existing SubDagOperator
till Airflow 2.1. Is it okay to continue the discussion about TaskGroup as
a brand new concept/feature independent from the existing SubDagOperator?
In other words, shall we add TaskGroup as a UI grouping concept like Ash
suggested, and not touch SubDagOperator atl all. Whenever we are ready with
TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.

I really like Ash's idea of simplifying the SubDagOperator idea into a
simple UI grouping concept. I think Xinbin's idea of "reattaching all the
tasks to the root DAG" is the way to go. And I see James pointed out we
need some helper functions to simplify dependencies setting of TaskGroup.
Xinbin put up a pretty elegant example in his PR
<https://github.com/apache/airflow/pull/9243>. I think having TaskGroup as
a UI concept should be a relatively small change. We can simplify Xinbin's
PR further. So I put up this alternative proposal here:
https://github.com/apache/airflow/pull/10153

I have not done any UI changes due to lack of experience with web UI. If
anyone's interested, please take a look at the PR.

Qian

On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Similar point here to the other ideas that are popping up. Maybe we should
> just focus on completing 2.0 and make all discussions about further
> improvements to 2.1? While those are important discussions (and we should
> continue them in the  near future !) I think at this point focusing on
> delivering 2.0 in its current shape should be our focus now ?
>
> J.
>
> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <bi...@gmail.com>
> wrote:
>
> > Hi Daniel
> >
> > I agree that the TaskGroup should have the same API as a DAG object
> related
> > to task dependencies, but it will not have anything related to actual
> > execution or scheduling.
> > I will update the AIP according to this over the weekend.
> >
> > > We could even make a “DAGTemplate” object s.t. when you import the
> object
> > you can import it with parameters to determine the shape of the DAG.
> >
> > Can you elaborate a bit more on this? Does it serve a similar purpose as
> a
> > DAG factory function?
> >
> >
> >
> > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> daniel.imberman@gmail.com
> > >
> > wrote:
> >
> > > Hi Bin,
> > >
> > > Why not give the TaskGroup the same API as a DAG object (e.g. the
> bitwise
> > > operator fro task dependencies). We could even make a “DAGTemplate”
> > object
> > > s.t. when you import the object you can import it with parameters to
> > > determine the shape of the DAG.
> > >
> > >
> > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <bi...@gmail.com>
> > > wrote:
> > > The TaskGroup will not take schedule interval as a parameter itself,
> and
> > it
> > > depends on the DAG where it attaches to. In my opinion, the TaskGroup
> > will
> > > only contain a group of tasks with interdependencies, and the TaskGroup
> > > behaves like a task. It doesn't contain any execution/scheduling logic
> > > (i.e. schedule_interval, concurrency, max_active_runs etc.) like a DAG
> > > does.
> > >
> > > > For example, there is the scenario that the schedule interval of DAG
> is
> > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > >
> > > I am curious why you ask this. Is this a use case that you want to
> > achieve?
> > >
> > > Bin
> > >
> > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <th...@gmail.com> wrote:
> > >
> > > > Hi Bin,
> > > > Using TaskGroup, Is the schedule interval of TaskGroup the same as
> the
> > > > parent DAG? My main concern is whether the schedule interval of
> > TaskGroup
> > > > could be different with that of the DAG? For example, there is the
> > > scenario
> > > > that the schedule interval of DAG is 1 hour and the schedule interval
> > of
> > > > TaskGroup is 20 min.
> > > >
> > > > Cheers,
> > > > Nicholas
> > > >
> > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <bin.huangxb@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Nicholas,
> > > > >
> > > > > I am not sure about the old behavior of SubDagOperator, maybe it
> will
> > > > throw
> > > > > an error? But in the original proposal, the subdag's
> > schedule_interval
> > > > will
> > > > > be ignored. Or if we decide to use TaskGroup to replace SubDag,
> there
> > > > will
> > > > > be no subdag schedule_interval.
> > > > >
> > > > > Bin
> > > > >
> > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <th...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Bin,
> > > > > > Thanks for your good proposal. I was confused whether the
> schedule
> > > > > > interval of SubDAG is different from that of the parent DAG? I
> have
> > > > > > discussed with Jiajie Zhong about the schedule interval of
> SubDAG.
> > If
> > > > the
> > > > > > SubDagOperator has a different schedule interval, what will
> happen
> > > for
> > > > > the
> > > > > > scheduler to schedule the parent DAG?
> > > > > >
> > > > > > Regards,
> > > > > > Nicholas Jiang
> > > > > >
> > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > bin.huangxb@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > >
> > > > > > > I have rethought about the concept of subdag and task groups. I
> > > think
> > > > > the
> > > > > > > better way to approach this is to entirely remove subdag and
> > > > introduce
> > > > > > the
> > > > > > > concept of TaskGroup, which is a container of tasks along with
> > > their
> > > > > > > dependencies *without execution/scheduling logic as a DAG*. The
> > > only
> > > > > > > purpose of it is to group a list of tasks, but you still need
> to
> > > add
> > > > it
> > > > > > to
> > > > > > > a DAG for execution.
> > > > > > >
> > > > > > > Here is a small code snippet.
> > > > > > >
> > > > > > > ```
> > > > > > > class TaskGroup:
> > > > > > > """
> > > > > > > A TaskGroup contains a group of tasks.
> > > > > > >
> > > > > > > If default_args is missing, it will take default args from the
> > > > DAG.
> > > > > > > """
> > > > > > > def __init__(self, group_id, default_args):
> > > > > > > pass
> > > > > > >
> > > > > > >
> > > > > > > """
> > > > > > > You can add tasks to a task group similar to adding tasks to a
> > DAG
> > > > > > >
> > > > > > > This can be declared in a separate file from the dag file
> > > > > > > """
> > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > default_args=default_args)
> > > > > > > download_group.add_task(task1)
> > > > > > > task2.dag = download_group
> > > > > > >
> > > > > > > with download_group:
> > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > >
> > > > > > > [task, task2] >> task3
> > > > > > >
> > > > > > >
> > > > > > > """Add it to a DAG for execution"""
> > > > > > > with DAG(dag_id='start_download_dag',
> default_args=default_args,
> > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > start = DummyOperator(task_id='start')
> > > > > > > start >> download_group
> > > > > > > # this is equivalent to
> > > > > > > # start >> [task, task2] >> task3
> > > > > > > ```
> > > > > > >
> > > > > > > With this, we can still reuse a group of tasks and set
> > dependencies
> > > > > > between
> > > > > > > them; it avoids the boilerplate code from using SubDagOperator,
> > and
> > > > we
> > > > > > can
> > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > >
> > > > > > > User migration wise, we can introduce it before Airflow 2.0 and
> > > allow
> > > > > > > gradual transition. Then we can decide if we still want to keep
> > the
> > > > > > > SubDagOperator or simply remove it.
> > > > > > >
> > > > > > > Any thoughts?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Bin
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > >
> > > > > > > > +1, proposal looks good.
> > > > > > > >
> > > > > > > > The original intention was really to have tasks groups and a
> > > > > > zoom-in/out
> > > > > > > in
> > > > > > > > the UI. The original reasoning was to reuse the DAG object
> > since
> > > it
> > > > > is
> > > > > > a
> > > > > > > > group of tasks, but as highlighted here it does create
> > underlying
> > > > > > > > confusions since a DAG is much more than just a group of
> tasks.
> > > > > > > >
> > > > > > > > Max
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > joshipoornima06@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you for your email.
> > > > > > > > >
> > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > bin.huangxb@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This rewrites
> > the
> > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > method to unpack subdag while parsing, and it will
> > give a
> > > > > > flat
> > > > > > > > > > > > structure at
> > > > > > > > > > > > the task level
> > > > > > > > > > >
> > > > > > > > > > > The serialized_dag representation already does this I
> > > think.
> > > > At
> > > > > > > least
> > > > > > > > > if
> > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > >
> > > > > > > > > > I am not sure about serialized_dag representation, but at
> > > least
> > > > > it
> > > > > > > will
> > > > > > > > > > still keep the subdag entry in the DAG table? In my
> > proposal
> > > as
> > > > > > also
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > draft PR, the idea is to *extract the tasks from the
> subdag
> > > and
> > > > > add
> > > > > > > > them
> > > > > > > > > > back to the root_dag. *So the runtime DAG graph will look
> > > > exactly
> > > > > > the
> > > > > > > > > > same as without subdag but with metadata attached to
> those
> > > > > > sections.
> > > > > > > > > These
> > > > > > > > > > metadata will be later on used to render in the UI. So
> > after
> > > > > > parsing
> > > > > > > (
> > > > > > > > > > *DagBag.process_file()*), it will just output the
> *root_dag
> > > > > > *instead
> > > > > > > of
> > > > > > > > > *root_dag +
> > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > >
> > > > > > > > > > - e.g. section-1-* will have metadata
> > > > current_group=section-1,
> > > > > > > > > > parent_group=<the-root-dag-id> (welcome for naming
> > > > > suggestions),
> > > > > > > the
> > > > > > > > > > reason for parent_group is that we can have nested group
> > and
> > > > > > still
> > > > > > > > be
> > > > > > > > > > able to capture the dependency.
> > > > > > > > > >
> > > > > > > > > > Runtime DAG:
> > > > > > > > > > [image: image.png]
> > > > > > > > > >
> > > > > > > > > > While at the UI, what we see would be something like this
> > by
> > > > > > > utilizing
> > > > > > > > > the
> > > > > > > > > > metadata, and then we can expand or zoom into in some
> way.
> > > > > > > > > > [image: image.png]
> > > > > > > > > >
> > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > 1. We don't need to deal with the extra complexity of
> > SubDag
> > > > for
> > > > > > > > > execution
> > > > > > > > > > and scheduling. It will be the same as not using SubDag.
> > > > > > > > > > 2. Still have the benefits of modularized and reusable
> dag
> > > code
> > > > > and
> > > > > > > > > > declare dependencies between them. And with the new
> > > > > SubDagOperator
> > > > > > > (see
> > > > > > > > > AIP
> > > > > > > > > > or draft PR), we can use the same dag_factory function
> for
> > > > > > > generating 1
> > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag (in this
> > case,
> > > > it
> > > > > > will
> > > > > > > > > just
> > > > > > > > > > extract all underlying tasks and append to the root dag).
> > > > > > > > > >
> > > > > > > > > > - Then it gets to the idea of replacing subdag with a
> > > > simpler
> > > > > > > > concept
> > > > > > > > > > by Ash: the proposed change basically drains out the
> > > > contents
> > > > > > of
> > > > > > > a
> > > > > > > > > SubDag
> > > > > > > > > > and becomes more like
> > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > (forgive
> > > > > > > > > > me about the crazy name..). In this case, it is still
> > > > > necessary
> > > > > > to
> > > > > > > > > keep the
> > > > > > > > > > concept of subdag as it is nothing more than a name?
> > > > > > > > > >
> > > > > > > > > > That's why the TaskGroup idea comes up. Thanks Chris
> Palmer
> > > for
> > > > > > > helping
> > > > > > > > > > conceptualize the functionality of TaskGroup, I will just
> > > paste
> > > > > it
> > > > > > > > here.
> > > > > > > > > >
> > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > - You *can* have dependencies between Tasks in the same
> > > > > > TaskGroup,
> > > > > > > > but
> > > > > > > > > > > *cannot* have dependencies between a Task in a
> TaskGroup
> > > > and
> > > > > > > > either a
> > > > > > > > > > > Task in a different TaskGroup or a Task not in any
> group
> > > > > > > > > > > - You *can* have dependencies between a TaskGroup and
> > > > either
> > > > > > > other
> > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > - The UI will by default render a TaskGroup as a single
> > > > > > "object",
> > > > > > > > but
> > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > - You'd need some way to determine what the "status"
> of a
> > > > > > > TaskGroup
> > > > > > > > > was
> > > > > > > > > > > at least for UI display purposes
> > > > > > > > > >
> > > > > > > > > > I agree with Chris:
> > > > > > > > > > - From the backend's view (scheduler & executor), I think
> > > > > TaskGroup
> > > > > > > > > should
> > > > > > > > > > be ignored during execution. (unless we decide to
> implement
> > > > some
> > > > > > > > metadata
> > > > > > > > > > operations that allows start/stop a group of tasks etc.)
> > > > > > > > > > - From the UI's View, it should be able to pick up the
> > > > individual
> > > > > > > > tasks'
> > > > > > > > > > status and then determine the TaskGroup's status
> > > > > > > > > >
> > > > > > > > > > Bin
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman <
> > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > >> I hadn’t thought about using the `>>` operator to tie
> dags
> > > > > > together
> > > > > > > > but
> > > > > > > > > I
> > > > > > > > > >> think that sounds pretty great! I wonder if we could
> > > > essentially
> > > > > > > write
> > > > > > > > > in
> > > > > > > > > >> the ability to set dependencies to all starter-tasks for
> > > that
> > > > > DAG.
> > > > > > > > > >>
> > > > > > > > > >> I’m personally ok with SubDag being a mostly UI concept.
> > It
> > > > > > doesn’t
> > > > > > > > need
> > > > > > > > > >> to execute separately, you’re just adding more tasks to
> > the
> > > > > queue
> > > > > > > that
> > > > > > > > > will
> > > > > > > > > >> be executed when there are resources available.
> > > > > > > > > >>
> > > > > > > > > >> via Newton Mail [
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > >> ]
> > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <
> > > > > chris@crpalmer.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> I agree that SubDAGs are an overly complex abstraction.
> I
> > > > think
> > > > > > what
> > > > > > > > is
> > > > > > > > > >> needed/useful is a TaskGroup concept. On a high level I
> > > think
> > > > > you
> > > > > > > want
> > > > > > > > > >> this
> > > > > > > > > >> functionality:
> > > > > > > > > >>
> > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > >> - You *can* have dependencies between Tasks in the same
> > > > > TaskGroup,
> > > > > > > but
> > > > > > > > > >> *cannot* have dependencies between a Task in a TaskGroup
> > and
> > > > > > either
> > > > > > > a
> > > > > > > > > >> Task in a different TaskGroup or a Task not in any group
> > > > > > > > > >> - You *can* have dependencies between a TaskGroup and
> > either
> > > > > other
> > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > >> - The UI will by default render a TaskGroup as a single
> > > > > "object",
> > > > > > > but
> > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > >> - You'd need some way to determine what the "status" of
> a
> > > > > > TaskGroup
> > > > > > > > was
> > > > > > > > > >> at least for UI display purposes
> > > > > > > > > >>
> > > > > > > > > >> Not sure if it would need to be a top level object with
> > its
> > > > own
> > > > > > > > database
> > > > > > > > > >> table and model or just another attribute on tasks. I
> > think
> > > > you
> > > > > > > could
> > > > > > > > > >> build
> > > > > > > > > >> it in a way such that from the schedulers point of view
> a
> > > DAG
> > > > > with
> > > > > > > > > >> TaskGroups doesn't get treated any differently. So it
> > really
> > > > > just
> > > > > > > > > becomes
> > > > > > > > > >> a
> > > > > > > > > >> shortcut for setting dependencies between sets of Tasks,
> > and
> > > > > > allows
> > > > > > > > the
> > > > > > > > > UI
> > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > >>
> > > > > > > > > >> Chris
> > > > > > > > > >>
> > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > >> >
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Agree with James (and think it's actually the more
> > > important
> > > > > > issue
> > > > > > > > to
> > > > > > > > > >> fix),
> > > > > > > > > >> > but I am still convinced Ash' idea is the right way
> > > forward
> > > > > > (just
> > > > > > > it
> > > > > > > > > >> might
> > > > > > > > > >> > require a bit more work to deprecate than adding
> visual
> > > > > grouping
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > >> > UI).
> > > > > > > > > >> >
> > > > > > > > > >> > There was a previous thread about this FYI with more
> > > context
> > > > > on
> > > > > > > why
> > > > > > > > > >> subdags
> > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > >> >
> > > > > >
> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > . A
> > > > > > > > > >> > solution I outline there to Jame's problem is e.g.
> > > enabling
> > > > > the
> > > > > > >>
> > > > > > > > > >> operator
> > > > > > > > > >> > for Airflow operators to work with DAGs as well. I see
> > > this
> > > > > > being
> > > > > > > > > >> separate
> > > > > > > > > >> > from Ash' solution for DAG grouping in the UI but one
> of
> > > the
> > > > > two
> > > > > > > > items
> > > > > > > > > >> > required to replace all existing subdag functionality.
> > > > > > > > > >> >
> > > > > > > > > >> > I've been working with subdags for 3 years and they
> are
> > > > > always a
> > > > > > > > giant
> > > > > > > > > >> pain
> > > > > > > > > >> > to use. They are a constant source of user confusion
> and
> > > > > > breakages
> > > > > > > > > >> during
> > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <
> > > > > > jcoder01@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > I'm not sure I totally agree it's just a UI
> concept. I
> > > use
> > > > > the
> > > > > > > > > subdag
> > > > > > > > > >> > > operator to simplify dependencies too. If you have a
> > > group
> > > > > of
> > > > > > > > tasks
> > > > > > > > > >> that
> > > > > > > > > >> > > need to finish before another group of tasks start,
> > > using
> > > > a
> > > > > > > subdag
> > > > > > > > > is
> > > > > > > > > >> a
> > > > > > > > > >> > > pretty quick way to set those dependencies and I
> think
> > > > also
> > > > > > make
> > > > > > > > it
> > > > > > > > > >> > easier
> > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <
> > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash Berlin-Taylor
> <
> > > > > > > > ash@apache.org
> > > > > > > > > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Question:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Do we even need the SubDagOperator anymore?
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Would removing it entirely and just replacing it
> > > with
> > > > a
> > > > > UI
> > > > > > > > > >> grouping
> > > > > > > > > >> > > > > concept be conceptually simpler, less to get
> > wrong,
> > > > and
> > > > > > > closer
> > > > > > > > > to
> > > > > > > > > >> > what
> > > > > > > > > >> > > > > users actually want to achieve with subdags?
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > With your proposed change, tasks in subdags
> could
> > > > start
> > > > > > > > running
> > > > > > > > > in
> > > > > > > > > >> > > > > parallel (a good change) -- so should we not
> also
> > > just
> > > > > > > > > _enitrely_
> > > > > > > > > >> > > remove
> > > > > > > > > >> > > > > the concept of a sub dag and replace it with
> > > something
> > > > > > > > simpler.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Problems with subdags (I think. I haven't used
> > them
> > > > > > > > extensively
> > > > > > > > > so
> > > > > > > > > >> > may
> > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > >> > > > > - They need their own dag_id, but it has(?) to
> be
> > of
> > > > the
> > > > > > > form
> > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > >> > > > > - They need their own schedule_interval, but it
> > has
> > > to
> > > > > > match
> > > > > > > > the
> > > > > > > > > >> > parent
> > > > > > > > > >> > > > dag
> > > > > > > > > >> > > > > - Sub dags can be paused on their own. (Does it
> > make
> > > > > sense
> > > > > > > to
> > > > > > > > do
> > > > > > > > > >> > this?
> > > > > > > > > >> > > > > Pausing just a sub dag would mean the sub dag
> > would
> > > > > never
> > > > > > > > > >> execute, so
> > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > >> > > > > - You had to choose the executor to operator a
> > > subdag
> > > > > with
> > > > > > > --
> > > > > > > > > >> always
> > > > > > > > > >> > a
> > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > -ash
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash Berlin-Taylor <
> > > > > > > > ash@apache.org>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm excited to
> > see
> > > > how
> > > > > > > this
> > > > > > > > > >> > > progresses.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*: This
> > > > rewrites
> > > > > > the
> > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > >> > > > > >> method to unpack subdag while parsing, and it
> > > will
> > > > > > give a
> > > > > > > > > flat
> > > > > > > > > >> > > > > >> structure at
> > > > > > > > > >> > > > > >> the task level
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > The serialized_dag representation already does
> > > this
> > > > I
> > > > > > > think.
> > > > > > > > > At
> > > > > > > > > >> > least
> > > > > > > > > >> > > > if
> > > > > > > > > >> > > > > > I've understood your idea here correctly.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > -ash
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <
> > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > >> >
> > > > > > > > > >> > > > wrote:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Sending a message to everyone and collect
> > > feedback
> > > > on
> > > > > > the
> > > > > > > > > >> AIP-34
> > > > > > > > > >> > on
> > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was previously
> > > > briefly
> > > > > > > > > >> mentioned in
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > >> discussion about what needs to be done for
> > > Airflow
> > > > > 2.0,
> > > > > > > and
> > > > > > > > > >> one of
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > >> ideas is to make SubDagOperator attach tasks
> > back
> > > > to
> > > > > > the
> > > > > > > > root
> > > > > > > > > >> DAG.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> This AIP-34 focuses on solving SubDagOperator
> > > > related
> > > > > > > > issues
> > > > > > > > > by
> > > > > > > > > >> > > > > reattaching
> > > > > > > > > >> > > > > >> all tasks back to the root dag while
> respecting
> > > > > > > > dependencies
> > > > > > > > > >> > during
> > > > > > > > > >> > > > > >> parsing. The original grouping effect on the
> UI
> > > > will
> > > > > be
> > > > > > > > > >> achieved
> > > > > > > > > >> > > > through
> > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> This also makes the dag_factory function more
> > > > > reusable
> > > > > > > > > because
> > > > > > > > > >> you
> > > > > > > > > >> > > > don't
> > > > > > > > > >> > > > > >> need to have parent_dag_name and
> child_dag_name
> > > in
> > > > > the
> > > > > > > > > function
> > > > > > > > > >> > > > > signature
> > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*: This
> > > > rewrites
> > > > > > the
> > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > >> > > > > >> method to unpack subdag while parsing, and it
> > > will
> > > > > > give a
> > > > > > > > > flat
> > > > > > > > > >> > > > > >> structure at
> > > > > > > > > >> > > > > >> the task level
> > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The new
> > > SubDagOperator
> > > > > > acts
> > > > > > > > > like a
> > > > > > > > > >> > > > > >> container and most of the original methods
> are
> > > > > removed.
> > > > > > > The
> > > > > > > > > >> > > > > >> signature is
> > > > > > > > > >> > > > > >> also changed to *subdag_factory *with
> > > *subdag_args
> > > > > *and
> > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > >> > > > > >> This is similar to the PythonOperator
> > signature.
> > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> current_group
> > &
> > > > > > > > parent_group
> > > > > > > > > >> > > > > attributes
> > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is used to
> > group
> > > > > tasks
> > > > > > > for
> > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > >> > > > > >> UI level. It may potentially extend further
> to
> > > > group
> > > > > > > > > arbitrary
> > > > > > > > > >> > > tasks
> > > > > > > > > >> > > > > >> outside the context of subdag to allow
> > > group-level
> > > > > > > > operations
> > > > > > > > > >> > > (i.e.
> > > > > > > > > >> > > > > >> stop/trigger a group of task within the dag)
> > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*: Proposed UI
> > > > modification
> > > > > > to
> > > > > > > > > allow
> > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a flat
> > > structure
> > > > to
> > > > > > > pair
> > > > > > > > > with
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > > first
> > > > > > > > > >> > > > > >> change instead of the original hierarchical
> > > > > structure.
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Please see related documents and PRs for
> > details:
> > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > >> > > > > >> Draft PR:
> > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Please let me know if there are any aspects
> > that
> > > > you
> > > > > > > > > >> > agree/disagree
> > > > > > > > > >> > > > > >> with or
> > > > > > > > > >> > > > > >> need more clarification (especially the third
> > > > change
> > > > > > > > > regarding
> > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > >> > > > > >> Any comments are welcome and I am looking
> > forward
> > > > to
> > > > > > it!
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > >> > > > > >> Bin
> > > > > > > > > >> > > > > >>
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > --
> > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Thanks & Regards
> > > > > > > > > Poornima
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by James Coder <jc...@gmail.com>.
Hi Quin,
Sounds good to me. I actually didn’t realize (or knew at one time and forgot) that you could collapse parts of the tree. I think having some mechanism to indicate the tasks are grouped would be nice, but agree this could be incrementally through other PRs. 
James

James Coder

> On Aug 14, 2020, at 4:45 AM, Yu Qian <yu...@gmail.com> wrote:
> 
> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit it?
> My wiki user email is yuqian1990@gmail.com.
> 
> Re Gerard: yes the UI loads all the nodes as json from the web server at
> once. However, it only adds the top level nodes and edges to the graph when
> the Graph View page is first opened. And then adds the expanded nodes to
> the graph as the user expands them. From what I've experienced with DAGs
> containing around 400 tasks (not using TaskGroup or SubDagOperator),
> opening the whole dag in Graph View usually takes 5 seconds. Less than 60ms
> of that is taken by loading the data from webserver. The remaining 4.9s+ is
> taken by javascript functions in dagre-d3.min.js such as createNodes,
> createEdgeLabels, etc and by rendering the graph. With TaskGroup being used
> to group tasks into a smaller number of top-level nodes, the amount of data
> loaded from webserver will remain about the same compared to a flat dag of
> the same size, but the number of nodes and edges needed to be plot on the
> graph can be reduced significantly. So in theory this should speed up the
> time it takes to open Graph View even without lazy-loading the data (I'll
> experiment to find out). That said, if it comes to a point lazy-loading
> helps, we can still implement it as an improvement.
> 
> Re James: the Tree View looks as if all all the groups are fully expanded.
> (because under the hood all the tasks are in a single DAG). I'm less
> worried about Tree View at the moment because it already has a mechanism
> for collapsing tasks by the dependency tree. That said, the Tree View can
> definitely be improved too with TaskGroup. (e.g. collapse tasks in the same
> TaskGroup when Tree View is first opened).
> 
> For both suggestions, implementing them don't require fundamental changes
> to the idea. I think we can have a basic working TaskGroup first, and then
> improve it incrementally in several PRs as we get more feedback from the
> community. What do you think?
> 
> Qian
> 
> 
>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com> wrote:
>> 
>> I agree this looks great, one question, how does the tree view look?
>> 
>> James Coder
>> 
>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gc...@twitter.com.invalid>
>> wrote:
>>> 
>>> First of all, this is awesome!!
>>> 
>>> Secondly, checking your UI code, seems you are loading all operators at
>>> once. Wondering if we can load them as needed (aka load whenever we click
>>> the TaskGroup). Some of our DAGs are so large that take forever to load
>> on
>>> the Graph view, so worried about this still being an issue here. It may
>> be
>>> easily solvable by implementing lazy loading of the graph. Not sure how
>>> easy to implement/add to the UI extension (and dont want to push for
>> early
>>> optimization as its the root of all evil).
>>> Gerard Casas Saez
>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>>> 
>>> 
>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bi...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Yu,
>>>> 
>>>> Thank you so much for taking on this. I was fairly distracted previously
>>>> and I didn't have the time to update the proposal. In fact, after
>>>> discussing with Ash, Kaxil and Daniel, the direction of this AIP has
>> been
>>>> changed to favor the concept of TaskGroup instead of rewriting
>>>> SubDagOperator (though it may may sense to deprecate SubDag in a future
>>>> date.).
>>>> 
>>>> Your PR is amazing and it has implemented the desire features. I think
>> we
>>>> can focus on your new PR instead. Do you mind updating the AIP based on
>>>> what you have done in your PR?
>>>> 
>>>> Best,
>>>> Bin
>>>> 
>>>> 
>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com> wrote:
>>>>> 
>>>>> Hi, all, I've added the basic UI changes to my proposed implementation
>> of
>>>>> TaskGroup as UI grouping concept:
>>>>> https://github.com/apache/airflow/pull/10153
>>>>> 
>>>>> I think Chris had a pretty good specification of TaskGroup so i'm
>> quoting
>>>>> it here. The only thing I don't fully agree with is the restriction
>>>>> "... **cannot*
>>>>> have dependencies between a Task in a TaskGroup and either a*
>>>>> *   Task in a different TaskGroup or a Task not in any group*". I think
>>>>> this is over restrictive. Since TaskGroup is a UI concept, tasks can
>> have
>>>>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In my
>>>> PR,
>>>>> this is allowed. The graph edges will update accordingly when
>> TaskGroups
>>>>> are expanded/collapsed. TaskGroup is only helping to make the UI look
>>>> less
>>>>> crowded. Under the hood, everything is still a DAG of tasks and edges
>> so
>>>>> things work normally. Here's a screenshot
>>>>> <
>>>>> 
>>>> 
>> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
>>>>>> 
>>>>> of the UI interaction.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have dependencies
>>>>> between Tasks in the same TaskGroup, but   *cannot* have dependencies
>>>>> between a Task in a TaskGroup and either a   Task in a different
>>>> TaskGroup
>>>>> or a Task not in any group   - You *can* have dependencies between a
>>>>> TaskGroup and either other   TaskGroups or Tasks not in any group   -
>> The
>>>>> UI will by default render a TaskGroup as a single "object", but   which
>>>> you
>>>>> expand or zoom into in some way   - You'd need some way to determine
>> what
>>>>> the "status" of a TaskGroup was   at least for UI display purposes*
>>>>> 
>>>>> 
>>>>> Regarding Jake's comment, I agree it's possible to implement the
>>>> "retrying
>>>>> tasks in a group" pattern he mentioned as an optional feature of
>>>> TaskGroup
>>>>> although that may go against having TaskGroup as a pure UI concept. For
>>>> the
>>>>> motivating example Jake provided, I suggest implementing both
>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a single operator.
>> It
>>>>> can do something like BaseSensorOperator.execute() does in "reschedule"
>>>>> mode, i.e. it first executes some code to submit the long running job
>> to
>>>>> the external service, and store the state (e.g. in XCom). Then
>> reschedule
>>>>> itself. Subsequent runs then pokes for the completion state.
>>>>> 
>>>>> 
>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
>>>> <jferriero@google.com.invalid
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> I really like this idea of a TaskGroup container as I think this will
>>>> be
>>>>>> much easier to use than SubDag.
>>>>>> 
>>>>>> I'd like to propose an optional behavior for special retry mechanics
>>>> via
>>>>> a
>>>>>> TaskGroup.retry_all property.
>>>>>> This way I could use TaskGroup to replace my favorite use of SubDag
>> for
>>>>>> atomically retrying tasks of the pattern "act on external state then
>>>>>> reschedule poll until desired state reached".
>>>>>> 
>>>>>> Motivating use case I have for a SubDag is very simple two task group
>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
>>>>>> I use SubDag is because it gives me an easy way to retry the
>>>>> SubmitJobTask
>>>>>> if something about the PollJobSensor fails.
>>>>>> This pattern would be really nice for jobs that are expected to run a
>>>>> long
>>>>>> time (because we can use sensor can use reschedule mode freeing up
>>>> slots)
>>>>>> but might fail for a retryable reason.
>>>>>> However, using SubDag to meet this use case defeats the purpose
>> because
>>>>>> SubDag infamously
>>>>>> <
>>>>>> 
>>>>> 
>>>> 
>> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
>>>>>>> 
>>>>>> blocks a "controller" slot for the entire duration.
>>>>>> This may feel like a cyclic behavior but reality it is very common for
>>>> a
>>>>>> single operator to submit job / wait til done.
>>>>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with
>>>> an
>>>>>> optional reschedule mode if user knows that this job may take a long
>>>>> time.
>>>>>> 
>>>>>> I'd be happy to the development work on adding this specific retry
>>>>> behavior
>>>>>> to TaskGroup once the base concept is implemented if others in the
>>>>>> community would find this a useful feature.
>>>>>> 
>>>>>> Cheers,
>>>>>> Jake
>>>>>> 
>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com
>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> All for it :) . I think we are getting closer to have regular
>>>> planning
>>>>>> and
>>>>>>> making some structured approach to 2.0 and starting task force for it
>>>>>> soon,
>>>>>>> so I think this should be perfectly fine to discuss and even start
>>>>>>> implementing what's beyond as soon as we make sure that we are
>>>>>> prioritizing
>>>>>>> 2.0 work.
>>>>>>> 
>>>>>>> J,
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Jarek,
>>>>>>>> 
>>>>>>>> I agree we should not change the behaviour of the existing
>>>>>> SubDagOperator
>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion about
>>>>> TaskGroup
>>>>>>> as
>>>>>>>> a brand new concept/feature independent from the existing
>>>>>> SubDagOperator?
>>>>>>>> In other words, shall we add TaskGroup as a UI grouping concept
>>>> like
>>>>>> Ash
>>>>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
>>>>> ready
>>>>>>> with
>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
>>>>>>>> 
>>>>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
>>>> into
>>>>> a
>>>>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching
>>>> all
>>>>>> the
>>>>>>>> tasks to the root DAG" is the way to go. And I see James pointed
>>>> out
>>>>> we
>>>>>>>> need some helper functions to simplify dependencies setting of
>>>>>> TaskGroup.
>>>>>>>> Xinbin put up a pretty elegant example in his PR
>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
>>>>>> TaskGroup
>>>>>>> as
>>>>>>>> a UI concept should be a relatively small change. We can simplify
>>>>>>> Xinbin's
>>>>>>>> PR further. So I put up this alternative proposal here:
>>>>>>>> https://github.com/apache/airflow/pull/10153
>>>>>>>> 
>>>>>>>> I have not done any UI changes due to lack of experience with web
>>>> UI.
>>>>>> If
>>>>>>>> anyone's interested, please take a look at the PR.
>>>>>>>> 
>>>>>>>> Qian
>>>>>>>> 
>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
>>>>> Jarek.Potiuk@polidea.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Similar point here to the other ideas that are popping up. Maybe
>>>> we
>>>>>>>> should
>>>>>>>>> just focus on completing 2.0 and make all discussions about
>>>> further
>>>>>>>>> improvements to 2.1? While those are important discussions (and
>>>> we
>>>>>>> should
>>>>>>>>> continue them in the  near future !) I think at this point
>>>> focusing
>>>>>> on
>>>>>>>>> delivering 2.0 in its current shape should be our focus now ?
>>>>>>>>> 
>>>>>>>>> J.
>>>>>>>>> 
>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
>>>>> bin.huangxb@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Daniel
>>>>>>>>>> 
>>>>>>>>>> I agree that the TaskGroup should have the same API as a DAG
>>>>> object
>>>>>>>>> related
>>>>>>>>>> to task dependencies, but it will not have anything related to
>>>>>> actual
>>>>>>>>>> execution or scheduling.
>>>>>>>>>> I will update the AIP according to this over the weekend.
>>>>>>>>>> 
>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
>>>> import
>>>>>> the
>>>>>>>>> object
>>>>>>>>>> you can import it with parameters to determine the shape of the
>>>>>> DAG.
>>>>>>>>>> 
>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
>>>>>> purpose
>>>>>>>> as
>>>>>>>>> a
>>>>>>>>>> DAG factory function?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
>>>>>>>>> daniel.imberman@gmail.com
>>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>> 
>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
>>>>> the
>>>>>>>>> bitwise
>>>>>>>>>>> operator fro task dependencies). We could even make a
>>>>>> “DAGTemplate”
>>>>>>>>>> object
>>>>>>>>>>> s.t. when you import the object you can import it with
>>>>> parameters
>>>>>>> to
>>>>>>>>>>> determine the shape of the DAG.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
>>>>>>> bin.huangxb@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> The TaskGroup will not take schedule interval as a parameter
>>>>>>> itself,
>>>>>>>>> and
>>>>>>>>>> it
>>>>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
>>>>>>> TaskGroup
>>>>>>>>>> will
>>>>>>>>>>> only contain a group of tasks with interdependencies, and the
>>>>>>>> TaskGroup
>>>>>>>>>>> behaves like a task. It doesn't contain any
>>>>> execution/scheduling
>>>>>>>> logic
>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
>>>>> like
>>>>>> a
>>>>>>>> DAG
>>>>>>>>>>> does.
>>>>>>>>>>> 
>>>>>>>>>>>> For example, there is the scenario that the schedule
>>>> interval
>>>>>> of
>>>>>>>> DAG
>>>>>>>>> is
>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
>>>>>>>>>>> 
>>>>>>>>>>> I am curious why you ask this. Is this a use case that you
>>>> want
>>>>>> to
>>>>>>>>>> achieve?
>>>>>>>>>>> 
>>>>>>>>>>> Bin
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
>>>> thanosxnicholas@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
>>>>> same
>>>>>>> as
>>>>>>>>> the
>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
>>>> interval
>>>>> of
>>>>>>>>>> TaskGroup
>>>>>>>>>>>> could be different with that of the DAG? For example, there
>>>>> is
>>>>>>> the
>>>>>>>>>>> scenario
>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
>>>> schedule
>>>>>>>> interval
>>>>>>>>>> of
>>>>>>>>>>>> TaskGroup is 20 min.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Nicholas
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
>>>>> maybe
>>>>>>> it
>>>>>>>>> will
>>>>>>>>>>>> throw
>>>>>>>>>>>>> an error? But in the original proposal, the subdag's
>>>>>>>>>> schedule_interval
>>>>>>>>>>>> will
>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
>>>>>> SubDag,
>>>>>>>>> there
>>>>>>>>>>>> will
>>>>>>>>>>>>> be no subdag schedule_interval.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Bin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
>>>>>> thanosxnicholas@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>> Thanks for your good proposal. I was confused whether
>>>> the
>>>>>>>>> schedule
>>>>>>>>>>>>>> interval of SubDAG is different from that of the parent
>>>>>> DAG?
>>>>>>> I
>>>>>>>>> have
>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
>>>>> of
>>>>>>>>> SubDAG.
>>>>>>>>>> If
>>>>>>>>>>>> the
>>>>>>>>>>>>>> SubDagOperator has a different schedule interval, what
>>>>> will
>>>>>>>>> happen
>>>>>>>>>>> for
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Nicholas Jiang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
>>>>>>>>>> bin.huangxb@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have rethought about the concept of subdag and task
>>>>>>>> groups. I
>>>>>>>>>>> think
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> better way to approach this is to entirely remove
>>>>> subdag
>>>>>>> and
>>>>>>>>>>>> introduce
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
>>>>> along
>>>>>>>> with
>>>>>>>>>>> their
>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
>>>>>> DAG*.
>>>>>>>> The
>>>>>>>>>>> only
>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
>>>>> still
>>>>>>> need
>>>>>>>>> to
>>>>>>>>>>> add
>>>>>>>>>>>> it
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> a DAG for execution.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Here is a small code snippet.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> class TaskGroup:
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If default_args is missing, it will take default args
>>>>>> from
>>>>>>>> the
>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> You can add tasks to a task group similar to adding
>>>>> tasks
>>>>>>> to
>>>>>>>> a
>>>>>>>>>> DAG
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This can be declared in a separate file from the dag
>>>>> file
>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
>>>>>>>>>>>>>> default_args=default_args)
>>>>>>>>>>>>>>> download_group.add_task(task1)
>>>>>>>>>>>>>>> task2.dag = download_group
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> with download_group:
>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [task, task2] >> task3
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
>>>>>>>>> default_args=default_args,
>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
>>>>>>>>>>>>>>> start >> download_group
>>>>>>>>>>>>>>> # this is equivalent to
>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks and
>>>> set
>>>>>>>>>> dependencies
>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
>>>>>>>> SubDagOperator,
>>>>>>>>>> and
>>>>>>>>>>>> we
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> User migration wise, we can introduce it before
>>>> Airflow
>>>>>> 2.0
>>>>>>>> and
>>>>>>>>>>> allow
>>>>>>>>>>>>>>> gradual transition. Then we can decide if we still
>>>> want
>>>>>> to
>>>>>>>> keep
>>>>>>>>>> the
>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> +1, proposal looks good.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The original intention was really to have tasks
>>>>> groups
>>>>>>> and
>>>>>>>> a
>>>>>>>>>>>>>> zoom-in/out
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
>>>>>>> object
>>>>>>>>>> since
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
>>>>> create
>>>>>>>>>> underlying
>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
>>>> group
>>>>>> of
>>>>>>>>> tasks.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you for your email.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
>>>>>>>> rewrites
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
>>>> it
>>>>>>> will
>>>>>>>>>> give a
>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
>>>> does
>>>>>>> this I
>>>>>>>>>>> think.
>>>>>>>>>>>> At
>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
>>>>> representation,
>>>>>>> but
>>>>>>>> at
>>>>>>>>>>> least
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
>>>> In
>>>>> my
>>>>>>>>>> proposal
>>>>>>>>>>> as
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
>>>> from
>>>>>> the
>>>>>>>>> subdag
>>>>>>>>>>> and
>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
>>>>>> will
>>>>>>>> look
>>>>>>>>>>>> exactly
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
>>>> attached
>>>>>> to
>>>>>>>>> those
>>>>>>>>>>>>>> sections.
>>>>>>>>>>>>>>>>> These
>>>>>>>>>>>>>>>>>> metadata will be later on used to render in the
>>>>> UI.
>>>>>>> So
>>>>>>>>>> after
>>>>>>>>>>>>>> parsing
>>>>>>>>>>>>>>> (
>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
>>>> the
>>>>>>>>> *root_dag
>>>>>>>>>>>>>> *instead
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> *root_dag +
>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
>>>>>>>>>>>> current_group=section-1,
>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
>>>>> naming
>>>>>>>>>>>>> suggestions),
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
>>>>> nested
>>>>>>>> group
>>>>>>>>>> and
>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> able to capture the dependency.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Runtime DAG:
>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> While at the UI, what we see would be something
>>>>>> like
>>>>>>>> this
>>>>>>>>>> by
>>>>>>>>>>>>>>> utilizing
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
>>>> in
>>>>>> some
>>>>>>>>> way.
>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The benefits I can see is that:
>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
>>>>> complexity
>>>>>> of
>>>>>>>>>> SubDag
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
>>>> using
>>>>>>>> SubDag.
>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
>>>>>>> reusable
>>>>>>>>> dag
>>>>>>>>>>> code
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> declare dependencies between them. And with the
>>>>> new
>>>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>>> (see
>>>>>>>>>>>>>>>>> AIP
>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
>>>>>>> function
>>>>>>>>> for
>>>>>>>>>>>>>>> generating 1
>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
>>>>> (in
>>>>>>> this
>>>>>>>>>> case,
>>>>>>>>>>>> it
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
>>>>> root
>>>>>>>> dag).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
>>>>>> with a
>>>>>>>>>>>> simpler
>>>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
>>>> out
>>>>>> the
>>>>>>>>>>>> contents
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> SubDag
>>>>>>>>>>>>>>>>>> and becomes more like
>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
>>>>>>>>>>>>>>>>> (forgive
>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
>>>>>> still
>>>>>>>>>>>>> necessary
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> keep the
>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
>>>>>> name?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
>>>>>> Chris
>>>>>>>>> Palmer
>>>>>>>>>>> for
>>>>>>>>>>>>>>> helping
>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
>>>>>> will
>>>>>>>> just
>>>>>>>>>>> paste
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
>>>> in
>>>>>> the
>>>>>>>> same
>>>>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
>>>> a
>>>>>>>>> TaskGroup
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> either a
>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
>>>> in
>>>>>> any
>>>>>>>>> group
>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>>>> TaskGroup
>>>>>>> and
>>>>>>>>>>>> either
>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>>>> as
>>>>> a
>>>>>>>> single
>>>>>>>>>>>>>> "object",
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>>>>> "status"
>>>>>>>>> of a
>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I agree with Chris:
>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
>>>>> executor), I
>>>>>>>> think
>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
>>>> to
>>>>>>>>> implement
>>>>>>>>>>>> some
>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
>>>>> tasks
>>>>>>>> etc.)
>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
>>>>> up
>>>>>>> the
>>>>>>>>>>>> individual
>>>>>>>>>>>>>>>> tasks'
>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
>>>> status
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
>>>> Imberman
>>>>> <
>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
>>>>> to
>>>>>>> tie
>>>>>>>>> dags
>>>>>>>>>>>>>> together
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
>>>>>> could
>>>>>>>>>>>> essentially
>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
>>>>>> starter-tasks
>>>>>>>> for
>>>>>>>>>>> that
>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
>>>> UI
>>>>>>>> concept.
>>>>>>>>>> It
>>>>>>>>>>>>>> doesn’t
>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
>>>>>> tasks
>>>>>>>> to
>>>>>>>>>> the
>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be executed when there are resources
>>>> available.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> via Newton Mail [
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
>>>> <
>>>>>>>>>>>>> chris@crpalmer.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
>>>>>>>> abstraction.
>>>>>>>>> I
>>>>>>>>>>>> think
>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
>>>> high
>>>>>>> level
>>>>>>>> I
>>>>>>>>>>> think
>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> functionality:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
>>>>> the
>>>>>>>> same
>>>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
>>>>>>>> TaskGroup
>>>>>>>>>> and
>>>>>>>>>>>>>> either
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
>>>>> any
>>>>>>>> group
>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>>>> TaskGroup
>>>>>>> and
>>>>>>>>>> either
>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>>>> as a
>>>>>>>> single
>>>>>>>>>>>>> "object",
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>>>> "status"
>>>>>>>> of
>>>>>>>>> a
>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
>>>>> object
>>>>>>>> with
>>>>>>>>>> its
>>>>>>>>>>>> own
>>>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
>>>>>> tasks.
>>>>>>> I
>>>>>>>>>> think
>>>>>>>>>>>> you
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
>>>> point
>>>>> of
>>>>>>>> view
>>>>>>>>> a
>>>>>>>>>>> DAG
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
>>>> differently.
>>>>> So
>>>>>>> it
>>>>>>>>>> really
>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
>>>>> of
>>>>>>>> Tasks,
>>>>>>>>>> and
>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
>>>> the
>>>>>> more
>>>>>>>>>>> important
>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> fix),
>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
>>>>> right
>>>>>>> way
>>>>>>>>>>> forward
>>>>>>>>>>>>>> (just
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
>>>>> adding
>>>>>>>>> visual
>>>>>>>>>>>>> grouping
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> UI).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
>>>>> with
>>>>>>> more
>>>>>>>>>>> context
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> why
>>>>>>>>>>>>>>>>>>> subdags
>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>> 
>>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
>>>>>>>>>>>>>>>> . A
>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
>>>> is
>>>>>> e.g.
>>>>>>>>>>> enabling
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
>>>>>> well. I
>>>>>>>> see
>>>>>>>>>>> this
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
>>>> UI
>>>>>> but
>>>>>>>> one
>>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>> items
>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
>>>>>>>> functionality.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
>>>> and
>>>>>>> they
>>>>>>>>> are
>>>>>>>>>>>>> always a
>>>>>>>>>>>>>>>> giant
>>>>>>>>>>>>>>>>>>> pain
>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
>>>>>>> confusion
>>>>>>>>> and
>>>>>>>>>>>>>> breakages
>>>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
>>>> Coder <
>>>>>>>>>>>>>> jcoder01@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
>>>> UI
>>>>>>>>> concept. I
>>>>>>>>>>> use
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
>>>>> you
>>>>>>>> have a
>>>>>>>>>>> group
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
>>>> tasks
>>>>>>>> start,
>>>>>>>>>>> using
>>>>>>>>>>>> a
>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
>>>>>> and I
>>>>>>>>> think
>>>>>>>>>>>> also
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
>>>> Hamlin
>>>>> <
>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
>>>>>>>> Berlin-Taylor
>>>>>>>>> <
>>>>>>>>>>>>>>>> ash@apache.org
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Question:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
>>>>>> anymore?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
>>>>>>> replacing
>>>>>>>> it
>>>>>>>>>>> with
>>>>>>>>>>>> a
>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>>> grouping
>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
>>>> to
>>>>>> get
>>>>>>>>>> wrong,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> closer
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
>>>>>> subdags?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
>>>>>> subdags
>>>>>>>>> could
>>>>>>>>>>>> start
>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
>>>> we
>>>>>> not
>>>>>>>>> also
>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> _enitrely_
>>>>>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
>>>> it
>>>>>> with
>>>>>>>>>>> something
>>>>>>>>>>>>>>>> simpler.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
>>>>> haven't
>>>>>>> used
>>>>>>>>>> them
>>>>>>>>>>>>>>>> extensively
>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
>>>>>> has(?)
>>>>>>> to
>>>>>>>>> be
>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> form
>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
>>>>>>>>>>>>>>>>>>>>>>> - They need their own
>>>> schedule_interval,
>>>>>> but
>>>>>>>> it
>>>>>>>>>> has
>>>>>>>>>>> to
>>>>>>>>>>>>>> match
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> parent
>>>>>>>>>>>>>>>>>>>>>> dag
>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
>>>>>> (Does
>>>>>>>> it
>>>>>>>>>> make
>>>>>>>>>>>>> sense
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
>>>>> sub
>>>>>>> dag
>>>>>>>>>> would
>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>> execute, so
>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
>>>>>>> operator a
>>>>>>>>>>> subdag
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
>>>>>>>> Berlin-Taylor <
>>>>>>>>>>>>>>>> ash@apache.org>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
>>>>>>> excited
>>>>>>>> to
>>>>>>>>>> see
>>>>>>>>>>>> how
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> progresses.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>>>> parsing*:
>>>>>>> This
>>>>>>>>>>>> rewrites
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>>>> parsing,
>>>>>>> and
>>>>>>>> it
>>>>>>>>>>> will
>>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
>>>>>> already
>>>>>>>> does
>>>>>>>>>>> this
>>>>>>>>>>>> I
>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>> At
>>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
>>>>>> correctly.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
>>>>>> Huang <
>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
>>>>>> collect
>>>>>>>>>>> feedback
>>>>>>>>>>>> on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> AIP-34
>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
>>>>>>>> previously
>>>>>>>>>>>> briefly
>>>>>>>>>>>>>>>>>>> mentioned in
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
>>>>> done
>>>>>>> for
>>>>>>>>>>> Airflow
>>>>>>>>>>>>> 2.0,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> one of
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
>>>>> attach
>>>>>>>> tasks
>>>>>>>>>> back
>>>>>>>>>>>> to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> root
>>>>>>>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
>>>>>>>> SubDagOperator
>>>>>>>>>>>> related
>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>> reattaching
>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
>>>> while
>>>>>>>>> respecting
>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
>>>> effect
>>>>>> on
>>>>>>>> the
>>>>>>>>> UI
>>>>>>>>>>>> will
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
>>>>>> function
>>>>>>>> more
>>>>>>>>>>>>> reusable
>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
>>>>>>>>> child_dag_name
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>>>>>> signature
>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>>>> parsing*:
>>>>>>> This
>>>>>>>>>>>> rewrites
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>>>> parsing,
>>>>>>> and
>>>>>>>> it
>>>>>>>>>>> will
>>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
>>>> new
>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>> acts
>>>>>>>>>>>>>>>>> like a
>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
>>>>>>> methods
>>>>>>>>> are
>>>>>>>>>>>>> removed.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>> signature is
>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
>>>> *with
>>>>>>>>>>> *subdag_args
>>>>>>>>>>>>> *and
>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
>>>> PythonOperator
>>>>>>>>>> signature.
>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
>>>>>>>>> current_group
>>>>>>>>>> &
>>>>>>>>>>>>>>>> parent_group
>>>>>>>>>>>>>>>>>>>>>>> attributes
>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
>>>>> used
>>>>>>> to
>>>>>>>>>> group
>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
>>>>>>> further
>>>>>>>>> to
>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>> arbitrary
>>>>>>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
>>>>> allow
>>>>>>>>>>> group-level
>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>> (i.e.
>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
>>>>> the
>>>>>>>> dag)
>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
>>>> Proposed
>>>>>> UI
>>>>>>>>>>>> modification
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
>>>>>> flat
>>>>>>>>>>> structure
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> pair
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
>>>>>>> hierarchical
>>>>>>>>>>>>> structure.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
>>>> PRs
>>>>>> for
>>>>>>>>>> details:
>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
>>>>>>> aspects
>>>>>>>>>> that
>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> agree/disagree
>>>>>>>>>>>>>>>>>>>>>>>>> with or
>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
>>>>> the
>>>>>>>> third
>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>> regarding
>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
>>>>>> looking
>>>>>>>>>> forward
>>>>>>>>>>>> to
>>>>>>>>>>>>>> it!
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Thanks & Regards
>>>>>>>>>>>>>>>>> Poornima
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Jarek Potiuk
>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>>> 
>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
>>>>>>> <+48%20660%20796%20129>>
>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>> 
>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
>>>>>>> <+48%20660%20796%20129>>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> *Jacob Ferriero*
>>>>>> 
>>>>>> Strategic Cloud Engineer: Data Engineering
>>>>>> 
>>>>>> jferriero@google.com
>>>>>> 
>>>>>> 617-714-2509
>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Okay. On one hand, we want to automatically prefix task_id so that users
don't have to parametrize task_id themselves inside TaskGroup to maintain
task_id uniqueness. On the other hand, we don't want people to be surprised
when they introduce TaskGroup to an existing DAG and all of a sudden
task_id of existing tasks become prefixed with group_id.

It's actually not difficult to have the best of both worlds. Like Gerard
suggested, we can add an option prefix_group_id=True/False to TaskGroup to
control whether children tasks should have their task_id prefixed with
group_id automatically. That way, if users want the Plan A behaviour, they
can set prefix_group_id=True. They can set it to False to achieve Plan B
behaviour. I think it makes sense to make prefix_group_id=True by default
since that's the behaviour AIP-34 already described.
Setting prefix_group_id to False is mostly for keeping task_id of existing
DAGs unchanged when adopting TaskGroup.

I'll update the PR <https://github.com/apache/airflow/pull/10153> to have
the option prefix_group_id=True/False unless I hear objections.


On Wed, Sep 2, 2020 at 12:08 AM Gerard Casas Saez
<gc...@twitter.com.invalid> wrote:

> As I mentioned in the issue, I believe prefixing group_id is a nice thing
> as it makes TaskGroup an equivalent for SubDagOperator. Internally we have
> a similar concept to TaskGroup called FlattenedSubDagOperator that
> append the group_id to the task_id.
>
> One of the main usages internally for this operator is hyperparameter
> tuning ML models. For that we provide an abstraction where users  provide a
> SubDag that takes in dictionary of hyperparameters (through XComArg) and
> push and xcom that is a dictionary of metrics.This task group is usually a
> combination of model training and model analysis, but it can be whatever
> you want. We create a hyperparameter tuning DAG for the user easily by
> instantiating this SubDag/TaskGroup many times for the number of
> experiments they need to perform.
>
> Now, if group_id is not appended to task_id, this type of reuse of task
> groups would not be possible. You would need to ask the user to parametrize
> task_id and that's a bit counter intuitive as Airflow task_id are not
> templatized. Another option is to make this behaviour customizable and have
> a flag that activates it on the TaskGroup.
>
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Tue, Sep 1, 2020 at 1:03 AM Yu Qian <yu...@gmail.com> wrote:
>
> > The vote for this AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >
> > passed. However, there's an interesting discussion going on here
> > <https://github.com/apache/airflow/pull/10153#discussion_r480247681>
> > regarding whether task_id should be automatically prefixed with group_id
> of
> > TaskGroup. So I'm bringing it up in this email thread for discussion.
> >
> > Plan A: Prefix task_id with group_id of TaskGroup. This is the original
> > plan in AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >.
> > The task_id argument passed to an operator just needs to be unique across
> > the TaskGroup. The actual task_id is prefixed with the group_id so
> task_id
> > is guaranteed to be unique across the DAG.
> >
> > Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
> > argument passed to the operator is the actual task_id. So the user is
> > forced to make sure task_id is unique across the whole DAG.
> >
> > Obviously the convenience of Plan A is not free of charge. I’m
> summarizing
> > some of the pros and cons in this table. There are two examples at the
> > bottom illustrating the different usage. I was convinced by houqp on the
> > github comments and some of my own experiments that Plan B has more
> > advantages and avoids surprises. I'm going to update AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >
> > according to Plan B unless I hear strong objections before 20200903 7am
> > UTC.
> >
> >
> >
> >
> > Plan A
> >
> > Plan B
> >
> > Ease of Use
> >
> > Easier to use for new DAGs
> >
> > Slightly more work on the user to maintain task_id uniqueness
> >
> > Implementation
> >
> > A little more complicated. Each group needs to know its parent’s group_id
> > in order to prefix the group_id correctly.
> >
> > Implementation is simpler. No need to know the parent TaskGroup’s
> group_id.
> >
> > Ease of Migration
> >
> > task_id will change if TaskGroup is introduced into an existing DAG.
> > Existing tasks put into a TaskGroup will appear like new tasks if the DAG
> > already has some historical DagRun. This may pose a barrier to adoption
> of
> > TaskGroup.
> >
> > No change in task_id when an existing task is put into a TaskGroup.
> > Migrating existing DAGs to adopt TaskGroup will be easier.
> >
> > Actual task_id
> >
> > Actual task_id tend to be longer because it’s always prefixed with
> > group_id, especially if the task is in a nested TaskGroup.
> >
> > Actual task_id tend to be shorter because users control the actual
> task_id
> > themselves.
> >
> > Graph label
> >
> > Labels on Graph View tend to be shorter because task_id only needs to be
> > unique within the TaskGroup
> >
> > Labels on Graph View tend to be longer because it displays the actual
> > task_id, which is a unique str across the DAG.
> >
> >
> > Plan A Example:
> >
> > def create_section():
> >
> >     dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]
> >
> >     with TaskGroup("inside_section_1") as inside_section_1:
> >
> >         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
> >
> >     with TaskGroup("inside_section_2") as inside_section_2:
> >
> >         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
> >
> >     dummies[-1] >> inside_section_1
> >
> >     dummies[-2] >> inside_section_2
> >
> >     inside_section_1 >> inside_section_2
> >
> >
> > with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> >
> >     start = DummyOperator(task_id="start")
> >
> >     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> > section_1:
> >
> >         create_section()
> >
> >     some_other_task = DummyOperator(task_id="some-other-task")
> >
> >     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> > section_2:
> >
> >         create_section()
> >
> >     end = DummyOperator(task_id='end')
> >
> >     start >> section_1 >> some_other_task >> section_2 >> end
> >
> >
> > Plan B Example:
> >
> > def create_section(section_num):
> >
> >     dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
> > in range(5)]
> >
> >     with TaskGroup(f"section_{section_num}.1") as inside_section_1:
> >
> >         _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for
> i
> > in range(3)]
> >
> >     with TaskGroup(f"section_{section_num}.2") as inside_section_2:
> >
> >         _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for
> i
> > in range(3)]
> >
> >     dummies[-1] >> inside_section_1
> >
> >     dummies[-2] >> inside_section_2
> >
> >     inside_section_1 >> inside_section_2
> >
> >
> > with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> >
> >     start = DummyOperator(task_id="start")
> >
> >     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> > section_1:
> >
> >         create_section(1)
> >
> >     some_other_task = DummyOperator(task_id="some-other-task")
> >
> >     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> > section_2:
> >
> >         create_section(2)
> >
> >     end = DummyOperator(task_id='end')
> >
> >     start >> section_1 >> some_other_task >> section_2 >> end
> >
> >
> > On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
> > <gc...@twitter.com.invalid> wrote:
> >
> > > Agree on this being non-blocking.
> > >
> > > Regarding moving to vote, you can take care. Just open a new email
> thread
> > > on dev list and call for a vote. You can see this example from Tomek
> for
> > > AIP-31:
> > >
> > >
> >
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
> > >
> > > Best,
> > >
> > >
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > > On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
> > >
> > > > Hi, Gerard, yes I agree it's possible to do this at UI level without
> > any
> > > > fundamental change to the implementation. If expand_group() sees that
> > two
> > > > groups are fully connected (i.e. every task in one parent group
> depends
> > > on
> > > > every task in another parent group), it can decide to collapse all
> > those
> > > > children edges into a single edge between the parent groups to reduce
> > the
> > > > burden of the layout() function. However, I did not find any existing
> > > > algorithm to do this within dagre so we'll likely need to implement
> > this
> > > > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > > > possible to call setEdge() between two parent groups (aka clusters).
> If
> > > > someone has ideas how to do this please feel free to contribute.
> > > >
> > > > One other consideration is that this example is only an extreme case.
> > > There
> > > > are other in-between cases that still require user intervention.
> Let's
> > > say
> > > > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> > > groups
> > > > have more than 100 tasks. This will still cause a lot of edges on the
> > > graph
> > > > and it's even harder to reduce because the parent groups are not
> fully
> > > > connected so it's inaccurate to reduce them to a single edge between
> > the
> > > > parents. In those cases, the user may still need to do something
> > > > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > > > edges. There will be some tradeoff because DummyOperator takes a
> short
> > > > while to execute like you mentioned.
> > > >
> > > > There are lots of room for improvements, but I don't think that's a
> > > > blocking issue for this AIP? So if you can move it to the voting
> stage
> > > > that'll be fantastic.
> > > >
> > > >
> > > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasassaez@twitter.com
> > .INVALID>
> > > > 写道:
> > > > > >
> > > > > > Is it not possible to solve this at the UI level? Aka tell dagre
> to
> > > > only
> > > > > > add 1 edge to the group instead of to all nodes in the group? No
> > need
> > > > to
> > > > > do
> > > > > > SubDag behaviour, but just reduce the edges on the graph. Should
> > > reduce
> > > > > > load time if I understand correctly.
> > > > > >
> > > > > > I would strongly avoid the Dummy operator since it will introduce
> > > > delays
> > > > > on
> > > > > > operator execution (as it will need to execute 1 dummy operator
> and
> > > > that
> > > > > > can be expensive imo).
> > > > > >
> > > > > > Overall though proposal looks good, unless anyone opposes it, I
> > would
> > > > > move
> > > > > > this to vote mode :D
> > > > > >
> > > > > > Gerard Casas Saez
> > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi, All,
> > > > > >> Here's the updated AIP-34
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > > > >>> .
> > > > > >> The PR has been fine-tuned with better UI interactions and added
> > > > > >> serialization of TaskGroup:
> > > > > https://github.com/apache/airflow/pull/10153
> > > > > >>
> > > > > >> Here's some experiment results:
> > > > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> > > this.
> > > > > Note
> > > > > >> there's a inside_section_2 is intentionally made to depend on
> all
> > > > tasks
> > > > > >> in inside_section_1 to generate a large number of edges. The
> > > > > observation is
> > > > > >> that opening the top level graph is very quick, around 270ms.
> > > > Expanding
> > > > > >> groups that don't have a lot of dense dependencies on other
> groups
> > > are
> > > > > also
> > > > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> > > that
> > > > > takes
> > > > > >> time is when expanding both groups inside_section_1 and
> > > > inside_section_2
> > > > > >> Because there are 2500 edges between these two inner groups, it
> > took
> > > > 63
> > > > > >> seconds to expand both of them. Majority of the time (more than
> > > > > 62seconds)
> > > > > >> is actually taken by the layout() function in dagre. In other
> > words,
> > > > > it's
> > > > > >> very fast to add nodes and edges, but laying them out on the
> graph
> > > > takes
> > > > > >> time. This issue is not actually a problem specific to
> TaskGroup.
> > > > > Without
> > > > > >> TaskGroup, if a DAG contains too many edges, it takes time to
> > layout
> > > > the
> > > > > >> graph too.
> > > > > >>
> > > > > >> On the other hand, a more realistic experiment with production
> DAG
> > > > > >> containing about 400 tasks and 700 edges showed that grouping
> > tasks
> > > > into
> > > > > >> three levels of nested TaskGroup cut the upfront page opening
> time
> > > > from
> > > > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > > > gradually
> > > > > >> expands all the groups one by one, but normally people don't
> need
> > to
> > > > > expand
> > > > > >> every group every time so it's still a big saving). The
> > experiments
> > > > are
> > > > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory,
> Chrome.
> > > > > >>
> > > > > >> I can see a few possible improvements to TaskGroup (or how it's
> > > used)
> > > > > that
> > > > > >> can be done as a next-step:
> > > > > >> 1). Like Gerard suggested, we can implement lazy-loading.
> Instead
> > of
> > > > > >> displaying the whole DAG, we can limit the Graph View to show
> > only a
> > > > > single
> > > > > >> TaskGroup, omitting its edges going out to other TaskGroups.
> This
> > > > > behaviour
> > > > > >> is more like SubDagOperator where users can zoom into/out of a
> > > > TaskGroup
> > > > > >> and look at only tasks within that TaskGroup as if those are the
> > > only
> > > > > tasks
> > > > > >> on the DAG. This can be done with either background javascript
> > calls
> > > > or
> > > > > by
> > > > > >> making a new get request with filtering parameters. Obviously
> the
> > > > > downside
> > > > > >> is that it's not as explicit as showing all the dependencies on
> > the
> > > > > graph.
> > > > > >> 2). Users can improve the organization of the DAG themselves to
> > > reduce
> > > > > the
> > > > > >> number of edges. E.g. if every task in group2 depends on every
> > tasks
> > > > in
> > > > > >> group1, instead of doing group1 >> group2, they can add a
> > > > DummyOperator
> > > > > in
> > > > > >> between and do this: group1 >> dummy >> group2. This cuts down
> the
> > > > > number
> > > > > >> of edges significantly and page load becomes much faster.
> > > > > >> 3). If we really want, we can improve the >> operator of
> TaskGroup
> > > to
> > > > > do 2)
> > > > > >> automatically. If it sees that both sides of >> are TaskGroup,
> it
> > > can
> > > > > >> create a DummyOperator on behalf of the user. The downside is
> that
> > > it
> > > > > may
> > > > > >> be too much magic.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Qian
> > > > > >>
> > > > > >> def create_section():
> > > > > >> """
> > > > > >> Create tasks in the outer section.
> > > > > >> """
> > > > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> > > range(100)]
> > > > > >>
> > > > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > > >>
> > > > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > > >>
> > > > > >> dummies[-1] >> inside_section_1
> > > > > >> dummies[-2] >> inside_section_2
> > > > > >> inside_section_1 >> inside_section_2
> > > > > >>
> > > > > >>
> > > > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> > > dag:
> > > > > >> start = DummyOperator(task_id="start")
> > > > > >>
> > > > > >> with TaskGroup("section_1") as section_1:
> > > > > >> create_section()
> > > > > >>
> > > > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > > > >>
> > > > > >> with TaskGroup("section_2") as section_2:
> > > > > >> create_section()
> > > > > >>
> > > > > >> end = DummyOperator(task_id='end')
> > > > > >>
> > > > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > > > >>
> > > > > >>
> > > > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > > > >> <gc...@twitter.com.invalid> wrote:
> > > > > >>
> > > > > >>> Re graph times. That makes sense. Let me know what you find. We
> > may
> > > > be
> > > > > >> able
> > > > > >>> to contribute on the lazy loading part.
> > > > > >>>
> > > > > >>> Looking forward to see the updated AIP!
> > > > > >>>
> > > > > >>>
> > > > > >>> Gerard Casas Saez
> > > > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Permissions granted, let me know if you face any issues.
> > > > > >>>>
> > > > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1990@gmail.com
> >
> > > > wrote:
> > > > > >>>>
> > > > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank
> > you!
> > > > > >>>>>
> > > > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <
> > kaxilnaik@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>
> > > > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> > > please
> > > > > >>>> create
> > > > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action
> and
> > > send
> > > > > >> us
> > > > > >>>>> your
> > > > > >>>>>> ID and we will add permissions.
> > > > > >>>>>>
> > > > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit
> > it?
> > > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <
> yuqian1990@gmail.com
> > >
> > > > > >>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request
> > permission
> > > > > >> to
> > > > > >>>> edit
> > > > > >>>>>> it?
> > > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > > >>>>>>>
> > > > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the
> > web
> > > > > >>> server
> > > > > >>>>> at
> > > > > >>>>>>> once. However, it only adds the top level nodes and edges
> to
> > > the
> > > > > >>>> graph
> > > > > >>>>>> when
> > > > > >>>>>>> the Graph View page is first opened. And then adds the
> > expanded
> > > > > >>> nodes
> > > > > >>>>> to
> > > > > >>>>>>> the graph as the user expands them. From what I've
> > experienced
> > > > > >> with
> > > > > >>>>> DAGs
> > > > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > > > >>> SubDagOperator),
> > > > > >>>>>>> opening the whole dag in Graph View usually takes 5
> seconds.
> > > Less
> > > > > >>>> than
> > > > > >>>>>> 60ms
> > > > > >>>>>>> of that is taken by loading the data from webserver. The
> > > > > >> remaining
> > > > > >>>>> 4.9s+
> > > > > >>>>>> is
> > > > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > > > >>> createNodes,
> > > > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> > > TaskGroup
> > > > > >>>> being
> > > > > >>>>>> used
> > > > > >>>>>>> to group tasks into a smaller number of top-level nodes,
> the
> > > > > >> amount
> > > > > >>>> of
> > > > > >>>>>> data
> > > > > >>>>>>> loaded from webserver will remain about the same compared
> to
> > a
> > > > > >> flat
> > > > > >>>> dag
> > > > > >>>>>> of
> > > > > >>>>>>> the same size, but the number of nodes and edges needed to
> be
> > > > > >> plot
> > > > > >>> on
> > > > > >>>>> the
> > > > > >>>>>>> graph can be reduced significantly. So in theory this
> should
> > > > > >> speed
> > > > > >>> up
> > > > > >>>>> the
> > > > > >>>>>>> time it takes to open Graph View even without lazy-loading
> > the
> > > > > >> data
> > > > > >>>>> (I'll
> > > > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > > > >>>> lazy-loading
> > > > > >>>>>>> helps, we can still implement it as an improvement.
> > > > > >>>>>>>
> > > > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> > > fully
> > > > > >>>>>> expanded.
> > > > > >>>>>>> (because under the hood all the tasks are in a single DAG).
> > I'm
> > > > > >>> less
> > > > > >>>>>>> worried about Tree View at the moment because it already
> has
> > a
> > > > > >>>>> mechanism
> > > > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> > > Tree
> > > > > >>> View
> > > > > >>>>> can
> > > > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse
> > tasks
> > > > > >> in
> > > > > >>>> the
> > > > > >>>>>> same
> > > > > >>>>>>> TaskGroup when Tree View is first opened).
> > > > > >>>>>>>
> > > > > >>>>>>> For both suggestions, implementing them don't require
> > > fundamental
> > > > > >>>>> changes
> > > > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> > > first,
> > > > > >>> and
> > > > > >>>>>> then
> > > > > >>>>>>> improve it incrementally in several PRs as we get more
> > feedback
> > > > > >>> from
> > > > > >>>>> the
> > > > > >>>>>>> community. What do you think?
> > > > > >>>>>>>
> > > > > >>>>>>> Qian
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> > > jcoder01@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> I agree this looks great, one question, how does the tree
> > view
> > > > > >>>> look?
> > > > > >>>>>>>>
> > > > > >>>>>>>> James Coder
> > > > > >>>>>>>>
> > > > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > > >>>>>> gcasassaez@twitter.com
> > > > > >>>>>>> .invalid>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> First of all, this is awesome!!
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Secondly, checking your UI code, seems you are loading
> all
> > > > > >>>>> operators
> > > > > >>>>>> at
> > > > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > > > >>> whenever
> > > > > >>>> we
> > > > > >>>>>>> click
> > > > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > > > >> forever
> > > > > >>>> to
> > > > > >>>>>> load
> > > > > >>>>>>>> on
> > > > > >>>>>>>>> the Graph view, so worried about this still being an
> issue
> > > > > >>> here.
> > > > > >>>> It
> > > > > >>>>>> may
> > > > > >>>>>>>> be
> > > > > >>>>>>>>> easily solvable by implementing lazy loading of the
> graph.
> > > > > >> Not
> > > > > >>>> sure
> > > > > >>>>>> how
> > > > > >>>>>>>>> easy to implement/add to the UI extension (and dont want
> to
> > > > > >>> push
> > > > > >>>>> for
> > > > > >>>>>>>> early
> > > > > >>>>>>>>> optimization as its the root of all evil).
> > > > > >>>>>>>>> Gerard Casas Saez
> > > > > >>>>>>>>> Twitter | Cortex | @casassaez <
> > http://twitter.com/casassaez>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > > >>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Hi Yu,
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > > > >> distracted
> > > > > >>>>>>> previously
> > > > > >>>>>>>>>> and I didn't have the time to update the proposal. In
> > fact,
> > > > > >>>> after
> > > > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of
> > this
> > > > > >>> AIP
> > > > > >>>>> has
> > > > > >>>>>>>> been
> > > > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > > > >> rewriting
> > > > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate
> > SubDag
> > > > > >>> in a
> > > > > >>>>>>> future
> > > > > >>>>>>>>>> date.).
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > > > >>> features. I
> > > > > >>>>>> think
> > > > > >>>>>>>> we
> > > > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating
> the
> > > > > >> AIP
> > > > > >>>>> based
> > > > > >>>>>>> on
> > > > > >>>>>>>>>> what you have done in your PR?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Bin
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > > > >>> yuqian1990@gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > > > >>>>>>> implementation
> > > > > >>>>>>>> of
> > > > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I think Chris had a pretty good specification of
> > TaskGroup
> > > > > >> so
> > > > > >>>> i'm
> > > > > >>>>>>>> quoting
> > > > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > > > >>>>> restriction
> > > > > >>>>>>>>>>> "... **cannot*
> > > > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and
> > either
> > > > > >> a*
> > > > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > > > >>>> group*". I
> > > > > >>>>>>> think
> > > > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI
> > concept,
> > > > > >>>> tasks
> > > > > >>>>>> can
> > > > > >>>>>>>> have
> > > > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > > > >>>> TaskGroup.
> > > > > >>>>>> In
> > > > > >>>>>>> my
> > > > > >>>>>>>>>> PR,
> > > > > >>>>>>>>>>> this is allowed. The graph edges will update
> accordingly
> > > > > >> when
> > > > > >>>>>>>> TaskGroups
> > > > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to
> make
> > > > > >> the
> > > > > >>>> UI
> > > > > >>>>>> look
> > > > > >>>>>>>>>> less
> > > > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of
> > tasks
> > > > > >>> and
> > > > > >>>>>> edges
> > > > > >>>>>>>> so
> > > > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> of the UI interaction.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can*
> > have
> > > > > >>>>>>> dependencies
> > > > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot*
> have
> > > > > >>>>>> dependencies
> > > > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > > > >>>> different
> > > > > >>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>> or a Task not in any group   - You *can* have
> > dependencies
> > > > > >>>>> between
> > > > > >>>>>> a
> > > > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in
> > any
> > > > > >>>> group
> > > > > >>>>>> -
> > > > > >>>>>>>> The
> > > > > >>>>>>>>>>> UI will by default render a TaskGroup as a single
> > "object",
> > > > > >>> but
> > > > > >>>>>>> which
> > > > > >>>>>>>>>> you
> > > > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way
> > to
> > > > > >>>>>> determine
> > > > > >>>>>>>> what
> > > > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI
> display
> > > > > >>>>> purposes*
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > > > >> implement
> > > > > >>>> the
> > > > > >>>>>>>>>> "retrying
> > > > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > > > >> feature
> > > > > >>>> of
> > > > > >>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>> although that may go against having TaskGroup as a pure
> > UI
> > > > > >>>>> concept.
> > > > > >>>>>>> For
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>> motivating example Jake provided, I suggest
> implementing
> > > > > >> both
> > > > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > > > >> single
> > > > > >>>>>>> operator.
> > > > > >>>>>>>> It
> > > > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does
> > in
> > > > > >>>>>>> "reschedule"
> > > > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the
> long
> > > > > >>>> running
> > > > > >>>>>> job
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>>> the external service, and store the state (e.g. in
> XCom).
> > > > > >>> Then
> > > > > >>>>>>>> reschedule
> > > > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > > > >> state.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > >>>>>>>>>> <jferriero@google.com.invalid
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > > > >> think
> > > > > >>>> this
> > > > > >>>>>>> will
> > > > > >>>>>>>>>> be
> > > > > >>>>>>>>>>>> much easier to use than SubDag.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> I'd like to propose an optional behavior for special
> > retry
> > > > > >>>>>> mechanics
> > > > > >>>>>>>>>> via
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite
> > use
> > > > > >> of
> > > > > >>>>>> SubDag
> > > > > >>>>>>>> for
> > > > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on
> > external
> > > > > >>>> state
> > > > > >>>>>> then
> > > > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple
> > two
> > > > > >>>> task
> > > > > >>>>>>> group
> > > > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to
> retry
> > > > > >> the
> > > > > >>>>>>>>>>> SubmitJobTask
> > > > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > > > >> expected
> > > > > >>>> to
> > > > > >>>>>> run
> > > > > >>>>>>> a
> > > > > >>>>>>>>>>> long
> > > > > >>>>>>>>>>>> time (because we can use sensor can use reschedule
> mode
> > > > > >>>> freeing
> > > > > >>>>> up
> > > > > >>>>>>>>>> slots)
> > > > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats
> the
> > > > > >>>> purpose
> > > > > >>>>>>>> because
> > > > > >>>>>>>>>>>> SubDag infamously
> > > > > >>>>>>>>>>>> <
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > > > >> very
> > > > > >>>>> common
> > > > > >>>>>>> for
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > > > >>>>>>>>>>>> We could use this case refactor many operators (e.g.
> BQ,
> > > > > >>>>> Dataproc,
> > > > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > > > >>>> PollTask]
> > > > > >>>>>>> with
> > > > > >>>>>>>>>> an
> > > > > >>>>>>>>>>>> optional reschedule mode if user knows that this job
> may
> > > > > >>> take
> > > > > >>>> a
> > > > > >>>>>> long
> > > > > >>>>>>>>>>> time.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > > > >> specific
> > > > > >>>>> retry
> > > > > >>>>>>>>>>> behavior
> > > > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > > > >> others
> > > > > >>> in
> > > > > >>>>> the
> > > > > >>>>>>>>>>>> community would find this a useful feature.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>> Jake
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > > > >>> regular
> > > > > >>>>>>>>>> planning
> > > > > >>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting
> > task
> > > > > >>>> force
> > > > > >>>>>> for
> > > > > >>>>>>> it
> > > > > >>>>>>>>>>>> soon,
> > > > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss
> and
> > > > > >>> even
> > > > > >>>>>> start
> > > > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure
> that
> > > > > >> we
> > > > > >>>> are
> > > > > >>>>>>>>>>>> prioritizing
> > > > > >>>>>>>>>>>>> 2.0 work.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> J,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > > > >>>> yuqian1990@gmail.com>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Hi Jarek,
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > > > >> existing
> > > > > >>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the
> > discussion
> > > > > >>>> about
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > > > >> existing
> > > > > >>>>>>>>>>>> SubDagOperator?
> > > > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI
> > grouping
> > > > > >>>>> concept
> > > > > >>>>>>>>>> like
> > > > > >>>>>>>>>>>> Ash
> > > > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > > > >> Whenever
> > > > > >>> we
> > > > > >>>>> are
> > > > > >>>>>>>>>>> ready
> > > > > >>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in
> Airflow
> > > > > >>> 2.1.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > > > >> SubDagOperator
> > > > > >>>>> idea
> > > > > >>>>>>>>>> into
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > > >>>>>> "reattaching
> > > > > >>>>>>>>>> all
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see
> > James
> > > > > >>>>> pointed
> > > > > >>>>>>>>>> out
> > > > > >>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > > > >>> setting
> > > > > >>>> of
> > > > > >>>>>>>>>>>> TaskGroup.
> > > > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I
> > think
> > > > > >>>> having
> > > > > >>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We
> > can
> > > > > >>>>>> simplify
> > > > > >>>>>>>>>>>>> Xinbin's
> > > > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal
> > here:
> > > > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of
> > experience
> > > > > >>>> with
> > > > > >>>>>> web
> > > > > >>>>>>>>>> UI.
> > > > > >>>>>>>>>>>> If
> > > > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Qian
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are
> > popping
> > > > > >>> up.
> > > > > >>>>>> Maybe
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all
> discussions
> > > > > >>> about
> > > > > >>>>>>>>>> further
> > > > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > > > >>> discussions
> > > > > >>>>> (and
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at
> this
> > > > > >>> point
> > > > > >>>>>>>>>> focusing
> > > > > >>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our
> > focus
> > > > > >>>> now ?
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> J.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Hi Daniel
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same
> API
> > > > > >> as a
> > > > > >>>> DAG
> > > > > >>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>> related
> > > > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have
> anything
> > > > > >>>> related
> > > > > >>>>> to
> > > > > >>>>>>>>>>>> actual
> > > > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > > > >>> weekend.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t.
> when
> > > > > >> you
> > > > > >>>>>>>>>> import
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > > > >> shape
> > > > > >>>> of
> > > > > >>>>>> the
> > > > > >>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it
> serve
> > a
> > > > > >>>>> similar
> > > > > >>>>>>>>>>>> purpose
> > > > > >>>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>> DAG factory function?
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > > > >> object
> > > > > >>>>> (e.g.
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> bitwise
> > > > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even
> > make a
> > > > > >>>>>>>>>>>> “DAGTemplate”
> > > > > >>>>>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > > > >> with
> > > > > >>>>>>>>>>> parameters
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as
> a
> > > > > >>>>> parameter
> > > > > >>>>>>>>>>>>> itself,
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > > > >> opinion,
> > > > > >>>> the
> > > > > >>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>> only contain a group of tasks with
> > interdependencies,
> > > > > >>> and
> > > > > >>>>> the
> > > > > >>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > > > >>>>>>>>>>> execution/scheduling
> > > > > >>>>>>>>>>>>>> logic
> > > > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency,
> > max_active_runs
> > > > > >>>> etc.)
> > > > > >>>>>>>>>>> like
> > > > > >>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>> does.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the
> > schedule
> > > > > >>>>>>>>>> interval
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is
> 20
> > > > > >>> min.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > > > >> that
> > > > > >>>> you
> > > > > >>>>>>>>>> want
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> achieve?
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > > > >> TaskGroup
> > > > > >>>> the
> > > > > >>>>>>>>>>> same
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the
> > schedule
> > > > > >>>>>>>>>> interval
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > > > >> example,
> > > > > >>>>> there
> > > > > >>>>>>>>>>> is
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>> scenario
> > > > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and
> > the
> > > > > >>>>>>>>>> schedule
> > > > > >>>>>>>>>>>>>> interval
> > > > > >>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>>>>>>> Nicholas
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > > > >>> SubDagOperator,
> > > > > >>>>>>>>>>> maybe
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>> throw
> > > > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > > > >> subdag's
> > > > > >>>>>>>>>>>>>>>> schedule_interval
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > > > >>> replace
> > > > > >>>>>>>>>>>> SubDag,
> > > > > >>>>>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > > > >>> whether
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> schedule
> > > > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of
> the
> > > > > >>>> parent
> > > > > >>>>>>>>>>>> DAG?
> > > > > >>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > > > >>>> interval
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> SubDAG.
> > > > > >>>>>>>>>>>>>>>> If
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule
> > interval,
> > > > > >>> what
> > > > > >>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>> happen
> > > > > >>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's
> feedback!
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag
> > and
> > > > > >>> task
> > > > > >>>>>>>>>>>>>> groups. I
> > > > > >>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely
> > remove
> > > > > >>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> introduce
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > > > >> tasks
> > > > > >>>>>>>>>>> along
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> their
> > > > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling
> > logic
> > > > > >>> as a
> > > > > >>>>>>>>>>>> DAG*.
> > > > > >>>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>>>> only
> > > > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks,
> but
> > > > > >> you
> > > > > >>>>>>>>>>> still
> > > > > >>>>>>>>>>>>> need
> > > > > >>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> add
> > > > > >>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take
> > default
> > > > > >>> args
> > > > > >>>>>>>>>>>> from
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > >>>>>>>>>>>>>>>>>>>>> pass
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > > > >> adding
> > > > > >>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from
> > the
> > > > > >>> dag
> > > > > >>>>>>>>>>> file
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> download_group =
> TaskGroup(group_id='download',
> > > > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > >>>>>>>>>>>>>>> default_args=default_args,
> > > > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of
> tasks
> > > > > >> and
> > > > > >>>>>>>>>> set
> > > > > >>>>>>>>>>>>>>>> dependencies
> > > > > >>>>>>>>>>>>>>>>>>>> between
> > > > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from
> using
> > > > > >>>>>>>>>>>>>> SubDagOperator,
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group
> >>
> > > > > >>> task`.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it
> before
> > > > > >>>>>>>>>> Airflow
> > > > > >>>>>>>>>>>> 2.0
> > > > > >>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > > > >> still
> > > > > >>>>>>>>>> want
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> keep
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > > > >> Beauchemin <
> > > > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have
> > tasks
> > > > > >>>>>>>>>>> groups
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse
> > the
> > > > > >>> DAG
> > > > > >>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>> since
> > > > > >>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it
> > does
> > > > > >>>>>>>>>>> create
> > > > > >>>>>>>>>>>>>>>> underlying
> > > > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than
> just
> > a
> > > > > >>>>>>>>>> group
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> tasks.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima
> > Joshi <
> > > > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin
> > Huang <
> > > > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*:
> > This
> > > > > >>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing,
> and
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > >>>>>>>>>> does
> > > > > >>>>>>>>>>>>> this I
> > > > > >>>>>>>>>>>>>>>>> think.
> > > > > >>>>>>>>>>>>>>>>>> At
> > > > > >>>>>>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > >>>>>>>>>>> representation,
> > > > > >>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> at
> > > > > >>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG
> > table?
> > > > > >>>>>>>>>> In
> > > > > >>>>>>>>>>> my
> > > > > >>>>>>>>>>>>>>>> proposal
> > > > > >>>>>>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the
> tasks
> > > > > >>>>>>>>>> from
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>> add
> > > > > >>>>>>>>>>>>>>>>>>>>>> them
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > > > >> graph
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>> look
> > > > > >>>>>>>>>>>>>>>>>> exactly
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > >>>>>>>>>> attached
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> those
> > > > > >>>>>>>>>>>>>>>>>>>> sections.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render
> in
> > > > > >> the
> > > > > >>>>>>>>>>> UI.
> > > > > >>>>>>>>>>>>> So
> > > > > >>>>>>>>>>>>>>>> after
> > > > > >>>>>>>>>>>>>>>>>>>> parsing
> > > > > >>>>>>>>>>>>>>>>>>>>> (
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just
> > output
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> *root_dag
> > > > > >>>>>>>>>>>>>>>>>>>> *instead
> > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome
> for
> > > > > >>>>>>>>>>> naming
> > > > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > > > >>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can
> have
> > > > > >>>>>>>>>>> nested
> > > > > >>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > > > >> something
> > > > > >>>>>>>>>>>> like
> > > > > >>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom
> > into
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>> way.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > >>>>>>>>>>> complexity
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>> SubDag
> > > > > >>>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > >>>>>>>>>> using
> > > > > >>>>>>>>>>>>>> SubDag.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized
> > and
> > > > > >>>>>>>>>>>>> reusable
> > > > > >>>>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>>> code
> > > > > >>>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And
> with
> > > > > >> the
> > > > > >>>>>>>>>>> new
> > > > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>> (see
> > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same
> > dag_factory
> > > > > >>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for
> > SubDag
> > > > > >>>>>>>>>>> (in
> > > > > >>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>> case,
> > > > > >>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to
> > the
> > > > > >>>>>>>>>>> root
> > > > > >>>>>>>>>>>>>> dag).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing
> > subdag
> > > > > >>>>>>>>>>>> with a
> > > > > >>>>>>>>>>>>>>>>>> simpler
> > > > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically
> drains
> > > > > >>>>>>>>>> out
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> contents
> > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case,
> it
> > > > > >> is
> > > > > >>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>>>> necessary
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more
> > than a
> > > > > >>>>>>>>>>>> name?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up.
> > Thanks
> > > > > >>>>>>>>>>>> Chris
> > > > > >>>>>>>>>>>>>>> Palmer
> > > > > >>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>> helping
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of
> > TaskGroup,
> > > > > >> I
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>> paste
> > > > > >>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between
> Tasks
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> same
> > > > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> > in
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task
> not
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> any
> > > > > >>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a
> TaskGroup
> > > > > >>>>>>>>>> as
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> single
> > > > > >>>>>>>>>>>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what
> the
> > > > > >>>>>>>>>>>>> "status"
> > > > > >>>>>>>>>>>>>>> of a
> > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > >>>>>>>>>>> executor), I
> > > > > >>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we
> > decide
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> implement
> > > > > >>>>>>>>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group
> of
> > > > > >>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> etc.)
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > > > >> pick
> > > > > >>>>>>>>>>> up
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> individual
> > > > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > >>>>>>>>>> status
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > >>>>>>>>>> Imberman
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > > > >> operator
> > > > > >>>>>>>>>>> to
> > > > > >>>>>>>>>>>>> tie
> > > > > >>>>>>>>>>>>>>> dags
> > > > > >>>>>>>>>>>>>>>>>>>> together
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder
> if
> > > > > >> we
> > > > > >>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>> essentially
> > > > > >>>>>>>>>>>>>>>>>>>>> write
> > > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > >>>>>>>>>>>> starter-tasks
> > > > > >>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a
> > mostly
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>> concept.
> > > > > >>>>>>>>>>>>>>>> It
> > > > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > > > >>>>>>>>>>>>>>>>>>>>>> need
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > > > >> more
> > > > > >>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > >>>>>>>>>> available.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris
> > Palmer
> > > > > >>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly
> complex
> > > > > >>>>>>>>>>>>>> abstraction.
> > > > > >>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On
> a
> > > > > >>>>>>>>>> high
> > > > > >>>>>>>>>>>>> level
> > > > > >>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>> want
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between
> Tasks
> > > > > >> in
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> same
> > > > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> > in
> > > > > >> a
> > > > > >>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task
> not
> > > > > >> in
> > > > > >>>>>>>>>>> any
> > > > > >>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a
> TaskGroup
> > > > > >>>>>>>>>> as a
> > > > > >>>>>>>>>>>>>> single
> > > > > >>>>>>>>>>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what
> the
> > > > > >>>>>>>>>>>> "status"
> > > > > >>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top
> level
> > > > > >>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>> its
> > > > > >>>>>>>>>>>>>>>>>> own
> > > > > >>>>>>>>>>>>>>>>>>>>>> database
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute
> > on
> > > > > >>>>>>>>>>>> tasks.
> > > > > >>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > >>>>>>>>>> point
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> view
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > >>>>>>>>>> differently.
> > > > > >>>>>>>>>>> So
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> really
> > > > > >>>>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > > > >> sets
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> Tasks,
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> allows
> > > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG
> > structure.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan
> > Davydov
> > > > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's
> actually
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>> important
> > > > > >>>>>>>>>>>>>>>>>>>> issue
> > > > > >>>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is
> the
> > > > > >>>>>>>>>>> right
> > > > > >>>>>>>>>>>>> way
> > > > > >>>>>>>>>>>>>>>>> forward
> > > > > >>>>>>>>>>>>>>>>>>>> (just
> > > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate
> than
> > > > > >>>>>>>>>>> adding
> > > > > >>>>>>>>>>>>>>> visual
> > > > > >>>>>>>>>>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this
> FYI
> > > > > >>>>>>>>>>> with
> > > > > >>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>> context
> > > > > >>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>> why
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>
> > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's
> problem
> > > > > >>>>>>>>>> is
> > > > > >>>>>>>>>>>> e.g.
> > > > > >>>>>>>>>>>>>>>>> enabling
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs
> as
> > > > > >>>>>>>>>>>> well. I
> > > > > >>>>>>>>>>>>>> see
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>> being
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in
> the
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> one
> > > > > >>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> two
> > > > > >>>>>>>>>>>>>>>>>>>>>> items
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > >>>>>>>>>>>>>> functionality.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3
> years
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>>>> they
> > > > > >>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>> always a
> > > > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of
> user
> > > > > >>>>>>>>>>>>> confusion
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> breakages
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone
> :).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > >>>>>>>>>> Coder <
> > > > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just
> a
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>> concept. I
> > > > > >>>>>>>>>>>>>>>>> use
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too.
> If
> > > > > >>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>> have a
> > > > > >>>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > >>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> start,
> > > > > >>>>>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those
> > dependencies
> > > > > >>>>>>>>>>>> and I
> > > > > >>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > >>>>>>>>>> Hamlin
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > >>>>>>>>>>>>>> Berlin-Taylor
> > > > > >>>>>>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > >>>>>>>>>>>> anymore?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > >>>>>>>>>>>>> replacing
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>> get
> > > > > >>>>>>>>>>>>>>>> wrong,
> > > > > >>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>> closer
> > > > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > >>>>>>>>>>>> subdags?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > >>>>>>>>>>>> subdags
> > > > > >>>>>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>> start
> > > > > >>>>>>>>>>>>>>>>>>>>>> running
> > > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>> not
> > > > > >>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> something
> > > > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > >>>>>>>>>>> haven't
> > > > > >>>>>>>>>>>>> used
> > > > > >>>>>>>>>>>>>>>> them
> > > > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > >>>>>>>>>>>> has(?)
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> form
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > >>>>>>>>>> schedule_interval,
> > > > > >>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> has
> > > > > >>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> match
> > > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their
> own.
> > > > > >>>>>>>>>>>> (Does
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>>>>>>> sense
> > > > > >>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>> do
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > >>>>>>>>>>> sub
> > > > > >>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>> would
> > > > > >>>>>>>>>>>>>>>>>>> never
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > >>>>>>>>>>>>> operator a
> > > > > >>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > >>>>>>>>>>>>> excited
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> see
> > > > > >>>>>>>>>>>>>>>>>> how
> > > > > >>>>>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>>>>>>>>>> parsing*:
> > > > > >>>>>>>>>>>>> This
> > > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>>>>>>>>>> parsing,
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > >>>>>>>>>>>> already
> > > > > >>>>>>>>>>>>>> does
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>>>>> think.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > >>>>>>>>>>>> correctly.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > >>>>>>>>>>>> Huang <
> > > > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > >>>>>>>>>>>> collect
> > > > > >>>>>>>>>>>>>>>>> feedback
> > > > > >>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > >>>>>>>>>>>>>> previously
> > > > > >>>>>>>>>>>>>>>>>> briefly
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > >>>>>>>>>>> done
> > > > > >>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>> Airflow
> > > > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > > > >>>>>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > >>>>>>>>>>> attach
> > > > > >>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>> back
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>> root
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > >>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>> related
> > > > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > >>>>>>>>>> while
> > > > > >>>>>>>>>>>>>>> respecting
> > > > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > >>>>>>>>>> effect
> > > > > >>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > >>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>> reusable
> > > > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > >>>>>>>>>>>>>>> child_dag_name
> > > > > >>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>>>>>>>>>> parsing*:
> > > > > >>>>>>>>>>>>> This
> > > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>>>>>>>>>> parsing,
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > >>>>>>>>>> new
> > > > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>> acts
> > > > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > >>>>>>>>>>>>> methods
> > > > > >>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>> removed.
> > > > > >>>>>>>>>>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > >>>>>>>>>> *with
> > > > > >>>>>>>>>>>>>>>>> *subdag_args
> > > > > >>>>>>>>>>>>>>>>>>> *and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > >>>>>>>>>> PythonOperator
> > > > > >>>>>>>>>>>>>>>> signature.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > >>>>>>>>>>>>>>> current_group
> > > > > >>>>>>>>>>>>>>>> &
> > > > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > >>>>>>>>>>> used
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > >>>>>>>>>>>>> further
> > > > > >>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > >>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>> group-level
> > > > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> dag)
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > >>>>>>>>>> Proposed
> > > > > >>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>> modification
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > >>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>> structure
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>> pair
> > > > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > >>>>>>>>>>>>> hierarchical
> > > > > >>>>>>>>>>>>>>>>>>> structure.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > >>>>>>>>>> PRs
> > > > > >>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>> details:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > https://github.com/apache/airflow/issues/8078
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > >>>>>>>>>>>>> aspects
> > > > > >>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> third
> > > > > >>>>>>>>>>>>>>>>>> change
> > > > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > >>>>>>>>>>>> looking
> > > > > >>>>>>>>>>>>>>>> forward
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> it!
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > > > >> Software
> > > > > >>>>>> Engineer
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > > > >> <+48660796129
> > > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > > >>>>> Engineer
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > <+48660796129
> > > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> *Jacob Ferriero*
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> jferriero@google.com
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> 617-714-2509
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
As I mentioned in the issue, I believe prefixing group_id is a nice thing
as it makes TaskGroup an equivalent for SubDagOperator. Internally we have
a similar concept to TaskGroup called FlattenedSubDagOperator that
append the group_id to the task_id.

One of the main usages internally for this operator is hyperparameter
tuning ML models. For that we provide an abstraction where users  provide a
SubDag that takes in dictionary of hyperparameters (through XComArg) and
push and xcom that is a dictionary of metrics.This task group is usually a
combination of model training and model analysis, but it can be whatever
you want. We create a hyperparameter tuning DAG for the user easily by
instantiating this SubDag/TaskGroup many times for the number of
experiments they need to perform.

Now, if group_id is not appended to task_id, this type of reuse of task
groups would not be possible. You would need to ask the user to parametrize
task_id and that's a bit counter intuitive as Airflow task_id are not
templatized. Another option is to make this behaviour customizable and have
a flag that activates it on the TaskGroup.


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Tue, Sep 1, 2020 at 1:03 AM Yu Qian <yu...@gmail.com> wrote:

> The vote for this AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >
> passed. However, there's an interesting discussion going on here
> <https://github.com/apache/airflow/pull/10153#discussion_r480247681>
> regarding whether task_id should be automatically prefixed with group_id of
> TaskGroup. So I'm bringing it up in this email thread for discussion.
>
> Plan A: Prefix task_id with group_id of TaskGroup. This is the original
> plan in AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >.
> The task_id argument passed to an operator just needs to be unique across
> the TaskGroup. The actual task_id is prefixed with the group_id so task_id
> is guaranteed to be unique across the DAG.
>
> Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
> argument passed to the operator is the actual task_id. So the user is
> forced to make sure task_id is unique across the whole DAG.
>
> Obviously the convenience of Plan A is not free of charge. I’m summarizing
> some of the pros and cons in this table. There are two examples at the
> bottom illustrating the different usage. I was convinced by houqp on the
> github comments and some of my own experiments that Plan B has more
> advantages and avoids surprises. I'm going to update AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >
> according to Plan B unless I hear strong objections before 20200903 7am
> UTC.
>
>
>
>
> Plan A
>
> Plan B
>
> Ease of Use
>
> Easier to use for new DAGs
>
> Slightly more work on the user to maintain task_id uniqueness
>
> Implementation
>
> A little more complicated. Each group needs to know its parent’s group_id
> in order to prefix the group_id correctly.
>
> Implementation is simpler. No need to know the parent TaskGroup’s group_id.
>
> Ease of Migration
>
> task_id will change if TaskGroup is introduced into an existing DAG.
> Existing tasks put into a TaskGroup will appear like new tasks if the DAG
> already has some historical DagRun. This may pose a barrier to adoption of
> TaskGroup.
>
> No change in task_id when an existing task is put into a TaskGroup.
> Migrating existing DAGs to adopt TaskGroup will be easier.
>
> Actual task_id
>
> Actual task_id tend to be longer because it’s always prefixed with
> group_id, especially if the task is in a nested TaskGroup.
>
> Actual task_id tend to be shorter because users control the actual task_id
> themselves.
>
> Graph label
>
> Labels on Graph View tend to be shorter because task_id only needs to be
> unique within the TaskGroup
>
> Labels on Graph View tend to be longer because it displays the actual
> task_id, which is a unique str across the DAG.
>
>
> Plan A Example:
>
> def create_section():
>
>     dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]
>
>     with TaskGroup("inside_section_1") as inside_section_1:
>
>         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
>
>     with TaskGroup("inside_section_2") as inside_section_2:
>
>         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
>
>     dummies[-1] >> inside_section_1
>
>     dummies[-2] >> inside_section_2
>
>     inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
>
>     start = DummyOperator(task_id="start")
>
>     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> section_1:
>
>         create_section()
>
>     some_other_task = DummyOperator(task_id="some-other-task")
>
>     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> section_2:
>
>         create_section()
>
>     end = DummyOperator(task_id='end')
>
>     start >> section_1 >> some_other_task >> section_2 >> end
>
>
> Plan B Example:
>
> def create_section(section_num):
>
>     dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
> in range(5)]
>
>     with TaskGroup(f"section_{section_num}.1") as inside_section_1:
>
>         _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for i
> in range(3)]
>
>     with TaskGroup(f"section_{section_num}.2") as inside_section_2:
>
>         _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for i
> in range(3)]
>
>     dummies[-1] >> inside_section_1
>
>     dummies[-2] >> inside_section_2
>
>     inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
>
>     start = DummyOperator(task_id="start")
>
>     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> section_1:
>
>         create_section(1)
>
>     some_other_task = DummyOperator(task_id="some-other-task")
>
>     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> section_2:
>
>         create_section(2)
>
>     end = DummyOperator(task_id='end')
>
>     start >> section_1 >> some_other_task >> section_2 >> end
>
>
> On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
> <gc...@twitter.com.invalid> wrote:
>
> > Agree on this being non-blocking.
> >
> > Regarding moving to vote, you can take care. Just open a new email thread
> > on dev list and call for a vote. You can see this example from Tomek for
> > AIP-31:
> >
> >
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
> >
> > Best,
> >
> >
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> > On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
> >
> > > Hi, Gerard, yes I agree it's possible to do this at UI level without
> any
> > > fundamental change to the implementation. If expand_group() sees that
> two
> > > groups are fully connected (i.e. every task in one parent group depends
> > on
> > > every task in another parent group), it can decide to collapse all
> those
> > > children edges into a single edge between the parent groups to reduce
> the
> > > burden of the layout() function. However, I did not find any existing
> > > algorithm to do this within dagre so we'll likely need to implement
> this
> > > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > > possible to call setEdge() between two parent groups (aka clusters). If
> > > someone has ideas how to do this please feel free to contribute.
> > >
> > > One other consideration is that this example is only an extreme case.
> > There
> > > are other in-between cases that still require user intervention. Let's
> > say
> > > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> > groups
> > > have more than 100 tasks. This will still cause a lot of edges on the
> > graph
> > > and it's even harder to reduce because the parent groups are not fully
> > > connected so it's inaccurate to reduce them to a single edge between
> the
> > > parents. In those cases, the user may still need to do something
> > > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > > edges. There will be some tradeoff because DummyOperator takes a short
> > > while to execute like you mentioned.
> > >
> > > There are lots of room for improvements, but I don't think that's a
> > > blocking issue for this AIP? So if you can move it to the voting stage
> > > that'll be fantastic.
> > >
> > >
> > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasassaez@twitter.com
> .INVALID>
> > > 写道:
> > > > >
> > > > > Is it not possible to solve this at the UI level? Aka tell dagre to
> > > only
> > > > > add 1 edge to the group instead of to all nodes in the group? No
> need
> > > to
> > > > do
> > > > > SubDag behaviour, but just reduce the edges on the graph. Should
> > reduce
> > > > > load time if I understand correctly.
> > > > >
> > > > > I would strongly avoid the Dummy operator since it will introduce
> > > delays
> > > > on
> > > > > operator execution (as it will need to execute 1 dummy operator and
> > > that
> > > > > can be expensive imo).
> > > > >
> > > > > Overall though proposal looks good, unless anyone opposes it, I
> would
> > > > move
> > > > > this to vote mode :D
> > > > >
> > > > > Gerard Casas Saez
> > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi, All,
> > > > >> Here's the updated AIP-34
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > > >>> .
> > > > >> The PR has been fine-tuned with better UI interactions and added
> > > > >> serialization of TaskGroup:
> > > > https://github.com/apache/airflow/pull/10153
> > > > >>
> > > > >> Here's some experiment results:
> > > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> > this.
> > > > Note
> > > > >> there's a inside_section_2 is intentionally made to depend on all
> > > tasks
> > > > >> in inside_section_1 to generate a large number of edges. The
> > > > observation is
> > > > >> that opening the top level graph is very quick, around 270ms.
> > > Expanding
> > > > >> groups that don't have a lot of dense dependencies on other groups
> > are
> > > > also
> > > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> > that
> > > > takes
> > > > >> time is when expanding both groups inside_section_1 and
> > > inside_section_2
> > > > >> Because there are 2500 edges between these two inner groups, it
> took
> > > 63
> > > > >> seconds to expand both of them. Majority of the time (more than
> > > > 62seconds)
> > > > >> is actually taken by the layout() function in dagre. In other
> words,
> > > > it's
> > > > >> very fast to add nodes and edges, but laying them out on the graph
> > > takes
> > > > >> time. This issue is not actually a problem specific to TaskGroup.
> > > > Without
> > > > >> TaskGroup, if a DAG contains too many edges, it takes time to
> layout
> > > the
> > > > >> graph too.
> > > > >>
> > > > >> On the other hand, a more realistic experiment with production DAG
> > > > >> containing about 400 tasks and 700 edges showed that grouping
> tasks
> > > into
> > > > >> three levels of nested TaskGroup cut the upfront page opening time
> > > from
> > > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > > gradually
> > > > >> expands all the groups one by one, but normally people don't need
> to
> > > > expand
> > > > >> every group every time so it's still a big saving). The
> experiments
> > > are
> > > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > > > >>
> > > > >> I can see a few possible improvements to TaskGroup (or how it's
> > used)
> > > > that
> > > > >> can be done as a next-step:
> > > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead
> of
> > > > >> displaying the whole DAG, we can limit the Graph View to show
> only a
> > > > single
> > > > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > > > behaviour
> > > > >> is more like SubDagOperator where users can zoom into/out of a
> > > TaskGroup
> > > > >> and look at only tasks within that TaskGroup as if those are the
> > only
> > > > tasks
> > > > >> on the DAG. This can be done with either background javascript
> calls
> > > or
> > > > by
> > > > >> making a new get request with filtering parameters. Obviously the
> > > > downside
> > > > >> is that it's not as explicit as showing all the dependencies on
> the
> > > > graph.
> > > > >> 2). Users can improve the organization of the DAG themselves to
> > reduce
> > > > the
> > > > >> number of edges. E.g. if every task in group2 depends on every
> tasks
> > > in
> > > > >> group1, instead of doing group1 >> group2, they can add a
> > > DummyOperator
> > > > in
> > > > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > > > number
> > > > >> of edges significantly and page load becomes much faster.
> > > > >> 3). If we really want, we can improve the >> operator of TaskGroup
> > to
> > > > do 2)
> > > > >> automatically. If it sees that both sides of >> are TaskGroup, it
> > can
> > > > >> create a DummyOperator on behalf of the user. The downside is that
> > it
> > > > may
> > > > >> be too much magic.
> > > > >>
> > > > >> Thanks,
> > > > >> Qian
> > > > >>
> > > > >> def create_section():
> > > > >> """
> > > > >> Create tasks in the outer section.
> > > > >> """
> > > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> > range(100)]
> > > > >>
> > > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > >>
> > > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > >>
> > > > >> dummies[-1] >> inside_section_1
> > > > >> dummies[-2] >> inside_section_2
> > > > >> inside_section_1 >> inside_section_2
> > > > >>
> > > > >>
> > > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> > dag:
> > > > >> start = DummyOperator(task_id="start")
> > > > >>
> > > > >> with TaskGroup("section_1") as section_1:
> > > > >> create_section()
> > > > >>
> > > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > > >>
> > > > >> with TaskGroup("section_2") as section_2:
> > > > >> create_section()
> > > > >>
> > > > >> end = DummyOperator(task_id='end')
> > > > >>
> > > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > > >>
> > > > >>
> > > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > > >> <gc...@twitter.com.invalid> wrote:
> > > > >>
> > > > >>> Re graph times. That makes sense. Let me know what you find. We
> may
> > > be
> > > > >> able
> > > > >>> to contribute on the lazy loading part.
> > > > >>>
> > > > >>> Looking forward to see the updated AIP!
> > > > >>>
> > > > >>>
> > > > >>> Gerard Casas Saez
> > > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > >>>
> > > > >>>
> > > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> > > > wrote:
> > > > >>>
> > > > >>>> Permissions granted, let me know if you face any issues.
> > > > >>>>
> > > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank
> you!
> > > > >>>>>
> > > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> > please
> > > > >>>> create
> > > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and
> > send
> > > > >> us
> > > > >>>>> your
> > > > >>>>>> ID and we will add permissions.
> > > > >>>>>>
> > > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit
> it?
> > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1990@gmail.com
> >
> > > > >>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request
> permission
> > > > >> to
> > > > >>>> edit
> > > > >>>>>> it?
> > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > >>>>>>>
> > > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the
> web
> > > > >>> server
> > > > >>>>> at
> > > > >>>>>>> once. However, it only adds the top level nodes and edges to
> > the
> > > > >>>> graph
> > > > >>>>>> when
> > > > >>>>>>> the Graph View page is first opened. And then adds the
> expanded
> > > > >>> nodes
> > > > >>>>> to
> > > > >>>>>>> the graph as the user expands them. From what I've
> experienced
> > > > >> with
> > > > >>>>> DAGs
> > > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > > >>> SubDagOperator),
> > > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds.
> > Less
> > > > >>>> than
> > > > >>>>>> 60ms
> > > > >>>>>>> of that is taken by loading the data from webserver. The
> > > > >> remaining
> > > > >>>>> 4.9s+
> > > > >>>>>> is
> > > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > > >>> createNodes,
> > > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> > TaskGroup
> > > > >>>> being
> > > > >>>>>> used
> > > > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > > > >> amount
> > > > >>>> of
> > > > >>>>>> data
> > > > >>>>>>> loaded from webserver will remain about the same compared to
> a
> > > > >> flat
> > > > >>>> dag
> > > > >>>>>> of
> > > > >>>>>>> the same size, but the number of nodes and edges needed to be
> > > > >> plot
> > > > >>> on
> > > > >>>>> the
> > > > >>>>>>> graph can be reduced significantly. So in theory this should
> > > > >> speed
> > > > >>> up
> > > > >>>>> the
> > > > >>>>>>> time it takes to open Graph View even without lazy-loading
> the
> > > > >> data
> > > > >>>>> (I'll
> > > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > > >>>> lazy-loading
> > > > >>>>>>> helps, we can still implement it as an improvement.
> > > > >>>>>>>
> > > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> > fully
> > > > >>>>>> expanded.
> > > > >>>>>>> (because under the hood all the tasks are in a single DAG).
> I'm
> > > > >>> less
> > > > >>>>>>> worried about Tree View at the moment because it already has
> a
> > > > >>>>> mechanism
> > > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> > Tree
> > > > >>> View
> > > > >>>>> can
> > > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse
> tasks
> > > > >> in
> > > > >>>> the
> > > > >>>>>> same
> > > > >>>>>>> TaskGroup when Tree View is first opened).
> > > > >>>>>>>
> > > > >>>>>>> For both suggestions, implementing them don't require
> > fundamental
> > > > >>>>> changes
> > > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> > first,
> > > > >>> and
> > > > >>>>>> then
> > > > >>>>>>> improve it incrementally in several PRs as we get more
> feedback
> > > > >>> from
> > > > >>>>> the
> > > > >>>>>>> community. What do you think?
> > > > >>>>>>>
> > > > >>>>>>> Qian
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> > jcoder01@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> I agree this looks great, one question, how does the tree
> view
> > > > >>>> look?
> > > > >>>>>>>>
> > > > >>>>>>>> James Coder
> > > > >>>>>>>>
> > > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > >>>>>> gcasassaez@twitter.com
> > > > >>>>>>> .invalid>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>> First of all, this is awesome!!
> > > > >>>>>>>>>
> > > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > > > >>>>> operators
> > > > >>>>>> at
> > > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > > >>> whenever
> > > > >>>> we
> > > > >>>>>>> click
> > > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > > >> forever
> > > > >>>> to
> > > > >>>>>> load
> > > > >>>>>>>> on
> > > > >>>>>>>>> the Graph view, so worried about this still being an issue
> > > > >>> here.
> > > > >>>> It
> > > > >>>>>> may
> > > > >>>>>>>> be
> > > > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > > > >> Not
> > > > >>>> sure
> > > > >>>>>> how
> > > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > > > >>> push
> > > > >>>>> for
> > > > >>>>>>>> early
> > > > >>>>>>>>> optimization as its the root of all evil).
> > > > >>>>>>>>> Gerard Casas Saez
> > > > >>>>>>>>> Twitter | Cortex | @casassaez <
> http://twitter.com/casassaez>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > >>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Hi Yu,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > > >> distracted
> > > > >>>>>>> previously
> > > > >>>>>>>>>> and I didn't have the time to update the proposal. In
> fact,
> > > > >>>> after
> > > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of
> this
> > > > >>> AIP
> > > > >>>>> has
> > > > >>>>>>>> been
> > > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > > >> rewriting
> > > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate
> SubDag
> > > > >>> in a
> > > > >>>>>>> future
> > > > >>>>>>>>>> date.).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > > >>> features. I
> > > > >>>>>> think
> > > > >>>>>>>> we
> > > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > > > >> AIP
> > > > >>>>> based
> > > > >>>>>>> on
> > > > >>>>>>>>>> what you have done in your PR?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Bin
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > > >>> yuqian1990@gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > > >>>>>>> implementation
> > > > >>>>>>>> of
> > > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I think Chris had a pretty good specification of
> TaskGroup
> > > > >> so
> > > > >>>> i'm
> > > > >>>>>>>> quoting
> > > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > > >>>>> restriction
> > > > >>>>>>>>>>> "... **cannot*
> > > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and
> either
> > > > >> a*
> > > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > > >>>> group*". I
> > > > >>>>>>> think
> > > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI
> concept,
> > > > >>>> tasks
> > > > >>>>>> can
> > > > >>>>>>>> have
> > > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > > >>>> TaskGroup.
> > > > >>>>>> In
> > > > >>>>>>> my
> > > > >>>>>>>>>> PR,
> > > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > > > >> when
> > > > >>>>>>>> TaskGroups
> > > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > > > >> the
> > > > >>>> UI
> > > > >>>>>> look
> > > > >>>>>>>>>> less
> > > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of
> tasks
> > > > >>> and
> > > > >>>>>> edges
> > > > >>>>>>>> so
> > > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>> of the UI interaction.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can*
> have
> > > > >>>>>>> dependencies
> > > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > > > >>>>>> dependencies
> > > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > > >>>> different
> > > > >>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>> or a Task not in any group   - You *can* have
> dependencies
> > > > >>>>> between
> > > > >>>>>> a
> > > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in
> any
> > > > >>>> group
> > > > >>>>>> -
> > > > >>>>>>>> The
> > > > >>>>>>>>>>> UI will by default render a TaskGroup as a single
> "object",
> > > > >>> but
> > > > >>>>>>> which
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way
> to
> > > > >>>>>> determine
> > > > >>>>>>>> what
> > > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > > > >>>>> purposes*
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > > >> implement
> > > > >>>> the
> > > > >>>>>>>>>> "retrying
> > > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > > >> feature
> > > > >>>> of
> > > > >>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>> although that may go against having TaskGroup as a pure
> UI
> > > > >>>>> concept.
> > > > >>>>>>> For
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > > > >> both
> > > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > > >> single
> > > > >>>>>>> operator.
> > > > >>>>>>>> It
> > > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does
> in
> > > > >>>>>>> "reschedule"
> > > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > > > >>>> running
> > > > >>>>>> job
> > > > >>>>>>>> to
> > > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > > > >>> Then
> > > > >>>>>>>> reschedule
> > > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > > >> state.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > >>>>>>>>>> <jferriero@google.com.invalid
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > > >> think
> > > > >>>> this
> > > > >>>>>>> will
> > > > >>>>>>>>>> be
> > > > >>>>>>>>>>>> much easier to use than SubDag.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd like to propose an optional behavior for special
> retry
> > > > >>>>>> mechanics
> > > > >>>>>>>>>> via
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite
> use
> > > > >> of
> > > > >>>>>> SubDag
> > > > >>>>>>>> for
> > > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on
> external
> > > > >>>> state
> > > > >>>>>> then
> > > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple
> two
> > > > >>>> task
> > > > >>>>>>> group
> > > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > > > >> the
> > > > >>>>>>>>>>> SubmitJobTask
> > > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > > >> expected
> > > > >>>> to
> > > > >>>>>> run
> > > > >>>>>>> a
> > > > >>>>>>>>>>> long
> > > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > > > >>>> freeing
> > > > >>>>> up
> > > > >>>>>>>>>> slots)
> > > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > > > >>>> purpose
> > > > >>>>>>>> because
> > > > >>>>>>>>>>>> SubDag infamously
> > > > >>>>>>>>>>>> <
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > > >> very
> > > > >>>>> common
> > > > >>>>>>> for
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > > > >>>>> Dataproc,
> > > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > > >>>> PollTask]
> > > > >>>>>>> with
> > > > >>>>>>>>>> an
> > > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > > > >>> take
> > > > >>>> a
> > > > >>>>>> long
> > > > >>>>>>>>>>> time.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > > >> specific
> > > > >>>>> retry
> > > > >>>>>>>>>>> behavior
> > > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > > >> others
> > > > >>> in
> > > > >>>>> the
> > > > >>>>>>>>>>>> community would find this a useful feature.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>> Jake
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > > >>> regular
> > > > >>>>>>>>>> planning
> > > > >>>>>>>>>>>> and
> > > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting
> task
> > > > >>>> force
> > > > >>>>>> for
> > > > >>>>>>> it
> > > > >>>>>>>>>>>> soon,
> > > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > > > >>> even
> > > > >>>>>> start
> > > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > > > >> we
> > > > >>>> are
> > > > >>>>>>>>>>>> prioritizing
> > > > >>>>>>>>>>>>> 2.0 work.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> J,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > > >>>> yuqian1990@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Hi Jarek,
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > > >> existing
> > > > >>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the
> discussion
> > > > >>>> about
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > > >> existing
> > > > >>>>>>>>>>>> SubDagOperator?
> > > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI
> grouping
> > > > >>>>> concept
> > > > >>>>>>>>>> like
> > > > >>>>>>>>>>>> Ash
> > > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > > >> Whenever
> > > > >>> we
> > > > >>>>> are
> > > > >>>>>>>>>>> ready
> > > > >>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > > > >>> 2.1.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > > >> SubDagOperator
> > > > >>>>> idea
> > > > >>>>>>>>>> into
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > >>>>>> "reattaching
> > > > >>>>>>>>>> all
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see
> James
> > > > >>>>> pointed
> > > > >>>>>>>>>> out
> > > > >>>>>>>>>>> we
> > > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > > >>> setting
> > > > >>>> of
> > > > >>>>>>>>>>>> TaskGroup.
> > > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I
> think
> > > > >>>> having
> > > > >>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We
> can
> > > > >>>>>> simplify
> > > > >>>>>>>>>>>>> Xinbin's
> > > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal
> here:
> > > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of
> experience
> > > > >>>> with
> > > > >>>>>> web
> > > > >>>>>>>>>> UI.
> > > > >>>>>>>>>>>> If
> > > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Qian
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are
> popping
> > > > >>> up.
> > > > >>>>>> Maybe
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > > > >>> about
> > > > >>>>>>>>>> further
> > > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > > >>> discussions
> > > > >>>>> (and
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > > > >>> point
> > > > >>>>>>>>>> focusing
> > > > >>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our
> focus
> > > > >>>> now ?
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> J.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Hi Daniel
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > > > >> as a
> > > > >>>> DAG
> > > > >>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>> related
> > > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > > > >>>> related
> > > > >>>>> to
> > > > >>>>>>>>>>>> actual
> > > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > > >>> weekend.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > > > >> you
> > > > >>>>>>>>>> import
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > > >> shape
> > > > >>>> of
> > > > >>>>>> the
> > > > >>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve
> a
> > > > >>>>> similar
> > > > >>>>>>>>>>>> purpose
> > > > >>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> DAG factory function?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > > >> object
> > > > >>>>> (e.g.
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> bitwise
> > > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even
> make a
> > > > >>>>>>>>>>>> “DAGTemplate”
> > > > >>>>>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > > >> with
> > > > >>>>>>>>>>> parameters
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > > > >>>>> parameter
> > > > >>>>>>>>>>>>> itself,
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > > >> opinion,
> > > > >>>> the
> > > > >>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>> only contain a group of tasks with
> interdependencies,
> > > > >>> and
> > > > >>>>> the
> > > > >>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > > >>>>>>>>>>> execution/scheduling
> > > > >>>>>>>>>>>>>> logic
> > > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency,
> max_active_runs
> > > > >>>> etc.)
> > > > >>>>>>>>>>> like
> > > > >>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>> does.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the
> schedule
> > > > >>>>>>>>>> interval
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > > > >>> min.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > > >> that
> > > > >>>> you
> > > > >>>>>>>>>> want
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> achieve?
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > > >> TaskGroup
> > > > >>>> the
> > > > >>>>>>>>>>> same
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the
> schedule
> > > > >>>>>>>>>> interval
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > > >> example,
> > > > >>>>> there
> > > > >>>>>>>>>>> is
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> scenario
> > > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and
> the
> > > > >>>>>>>>>> schedule
> > > > >>>>>>>>>>>>>> interval
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>>>>>>> Nicholas
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > > >>> SubDagOperator,
> > > > >>>>>>>>>>> maybe
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>> throw
> > > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > > >> subdag's
> > > > >>>>>>>>>>>>>>>> schedule_interval
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > > >>> replace
> > > > >>>>>>>>>>>> SubDag,
> > > > >>>>>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > > >>> whether
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> schedule
> > > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > > > >>>> parent
> > > > >>>>>>>>>>>> DAG?
> > > > >>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > > >>>> interval
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> SubDAG.
> > > > >>>>>>>>>>>>>>>> If
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule
> interval,
> > > > >>> what
> > > > >>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>> happen
> > > > >>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag
> and
> > > > >>> task
> > > > >>>>>>>>>>>>>> groups. I
> > > > >>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely
> remove
> > > > >>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> introduce
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > > >> tasks
> > > > >>>>>>>>>>> along
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> their
> > > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling
> logic
> > > > >>> as a
> > > > >>>>>>>>>>>> DAG*.
> > > > >>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > > > >> you
> > > > >>>>>>>>>>> still
> > > > >>>>>>>>>>>>> need
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> add
> > > > >>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take
> default
> > > > >>> args
> > > > >>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > >>>>>>>>>>>>>>>>>>>>> pass
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > > >> adding
> > > > >>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from
> the
> > > > >>> dag
> > > > >>>>>>>>>>> file
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > >>>>>>>>>>>>>>> default_args=default_args,
> > > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > > > >> and
> > > > >>>>>>>>>> set
> > > > >>>>>>>>>>>>>>>> dependencies
> > > > >>>>>>>>>>>>>>>>>>>> between
> > > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > >>>>>>>>>>>>>> SubDagOperator,
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > > > >>> task`.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > >>>>>>>>>> Airflow
> > > > >>>>>>>>>>>> 2.0
> > > > >>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > > >> still
> > > > >>>>>>>>>> want
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> keep
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > > >> Beauchemin <
> > > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have
> tasks
> > > > >>>>>>>>>>> groups
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse
> the
> > > > >>> DAG
> > > > >>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>> since
> > > > >>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it
> does
> > > > >>>>>>>>>>> create
> > > > >>>>>>>>>>>>>>>> underlying
> > > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just
> a
> > > > >>>>>>>>>> group
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> tasks.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima
> Joshi <
> > > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin
> Huang <
> > > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*:
> This
> > > > >>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > >>>>>>>>>> does
> > > > >>>>>>>>>>>>> this I
> > > > >>>>>>>>>>>>>>>>> think.
> > > > >>>>>>>>>>>>>>>>>> At
> > > > >>>>>>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > >>>>>>>>>>> representation,
> > > > >>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> at
> > > > >>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG
> table?
> > > > >>>>>>>>>> In
> > > > >>>>>>>>>>> my
> > > > >>>>>>>>>>>>>>>> proposal
> > > > >>>>>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > >>>>>>>>>> from
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>> add
> > > > >>>>>>>>>>>>>>>>>>>>>> them
> > > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > > >> graph
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>> look
> > > > >>>>>>>>>>>>>>>>>> exactly
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > >>>>>>>>>> attached
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> those
> > > > >>>>>>>>>>>>>>>>>>>> sections.
> > > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > > > >> the
> > > > >>>>>>>>>>> UI.
> > > > >>>>>>>>>>>>> So
> > > > >>>>>>>>>>>>>>>> after
> > > > >>>>>>>>>>>>>>>>>>>> parsing
> > > > >>>>>>>>>>>>>>>>>>>>> (
> > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just
> output
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> *root_dag
> > > > >>>>>>>>>>>>>>>>>>>> *instead
> > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > >>>>>>>>>>> naming
> > > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > > >>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > >>>>>>>>>>> nested
> > > > >>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > > >> something
> > > > >>>>>>>>>>>> like
> > > > >>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom
> into
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>> way.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > >>>>>>>>>>> complexity
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> SubDag
> > > > >>>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > >>>>>>>>>> using
> > > > >>>>>>>>>>>>>> SubDag.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized
> and
> > > > >>>>>>>>>>>>> reusable
> > > > >>>>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>>> code
> > > > >>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > > > >> the
> > > > >>>>>>>>>>> new
> > > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>>> (see
> > > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same
> dag_factory
> > > > >>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for
> SubDag
> > > > >>>>>>>>>>> (in
> > > > >>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>> case,
> > > > >>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to
> the
> > > > >>>>>>>>>>> root
> > > > >>>>>>>>>>>>>> dag).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing
> subdag
> > > > >>>>>>>>>>>> with a
> > > > >>>>>>>>>>>>>>>>>> simpler
> > > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > >>>>>>>>>> out
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> contents
> > > > >>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > > > >> is
> > > > >>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>>> necessary
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more
> than a
> > > > >>>>>>>>>>>> name?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up.
> Thanks
> > > > >>>>>>>>>>>> Chris
> > > > >>>>>>>>>>>>>>> Palmer
> > > > >>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>> helping
> > > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of
> TaskGroup,
> > > > >> I
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>> paste
> > > > >>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> same
> > > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> in
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> any
> > > > >>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >>>>>>>>>> as
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> single
> > > > >>>>>>>>>>>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>>>>>>>>>>> "status"
> > > > >>>>>>>>>>>>>>> of a
> > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > >>>>>>>>>>> executor), I
> > > > >>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we
> decide
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> implement
> > > > >>>>>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > >>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> etc.)
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > > >> pick
> > > > >>>>>>>>>>> up
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> individual
> > > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > >>>>>>>>>> status
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > >>>>>>>>>> Imberman
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > > >> operator
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>> tie
> > > > >>>>>>>>>>>>>>> dags
> > > > >>>>>>>>>>>>>>>>>>>> together
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > > > >> we
> > > > >>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>> essentially
> > > > >>>>>>>>>>>>>>>>>>>>> write
> > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > >>>>>>>>>>>> starter-tasks
> > > > >>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a
> mostly
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>>>> concept.
> > > > >>>>>>>>>>>>>>>> It
> > > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > > >>>>>>>>>>>>>>>>>>>>>> need
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > > >> more
> > > > >>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> queue
> > > > >>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > > >>>>>>>>>> available.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris
> Palmer
> > > > >>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > >>>>>>>>>>>>>> abstraction.
> > > > >>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > >>>>>>>>>> high
> > > > >>>>>>>>>>>>> level
> > > > >>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>> want
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > >> in
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> same
> > > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> in
> > > > >> a
> > > > >>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > >> in
> > > > >>>>>>>>>>> any
> > > > >>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >>>>>>>>>> as a
> > > > >>>>>>>>>>>>>> single
> > > > >>>>>>>>>>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>>>>>>>>>> "status"
> > > > >>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > >>>>>>>>>>> object
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>> its
> > > > >>>>>>>>>>>>>>>>>> own
> > > > >>>>>>>>>>>>>>>>>>>>>> database
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute
> on
> > > > >>>>>>>>>>>> tasks.
> > > > >>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > >>>>>>>>>> point
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> view
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > >>>>>>>>>> differently.
> > > > >>>>>>>>>>> So
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> really
> > > > >>>>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > > >> sets
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> Tasks,
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> allows
> > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG
> structure.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan
> Davydov
> > > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>> important
> > > > >>>>>>>>>>>>>>>>>>>> issue
> > > > >>>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > >>>>>>>>>>> right
> > > > >>>>>>>>>>>>> way
> > > > >>>>>>>>>>>>>>>>> forward
> > > > >>>>>>>>>>>>>>>>>>>> (just
> > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > >>>>>>>>>>> adding
> > > > >>>>>>>>>>>>>>> visual
> > > > >>>>>>>>>>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>> context
> > > > >>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>> why
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>
> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>>> e.g.
> > > > >>>>>>>>>>>>>>>>> enabling
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > >>>>>>>>>>>> well. I
> > > > >>>>>>>>>>>>>> see
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>> being
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> one
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> two
> > > > >>>>>>>>>>>>>>>>>>>>>> items
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > >>>>>>>>>>>>>> functionality.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>>> they
> > > > >>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>> always a
> > > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > >>>>>>>>>>>>> confusion
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> breakages
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > >>>>>>>>>> Coder <
> > > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>> concept. I
> > > > >>>>>>>>>>>>>>>>> use
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > >>>>>>>>>>> you
> > > > >>>>>>>>>>>>>> have a
> > > > >>>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > >>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> start,
> > > > >>>>>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those
> dependencies
> > > > >>>>>>>>>>>> and I
> > > > >>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > >>>>>>>>>> Hamlin
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > >>>>>>>>>>>>>> Berlin-Taylor
> > > > >>>>>>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > >>>>>>>>>>>> anymore?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > >>>>>>>>>>>>> replacing
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>> get
> > > > >>>>>>>>>>>>>>>> wrong,
> > > > >>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>> closer
> > > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > >>>>>>>>>>>> subdags?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > >>>>>>>>>>>> subdags
> > > > >>>>>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>> start
> > > > >>>>>>>>>>>>>>>>>>>>>> running
> > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> something
> > > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > >>>>>>>>>>> haven't
> > > > >>>>>>>>>>>>> used
> > > > >>>>>>>>>>>>>>>> them
> > > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > >>>>>>>>>>>> has(?)
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> form
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > >>>>>>>>>> schedule_interval,
> > > > >>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> has
> > > > >>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> match
> > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > >>>>>>>>>>>> (Does
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>>>>>>> sense
> > > > >>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > >>>>>>>>>>> sub
> > > > >>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>> would
> > > > >>>>>>>>>>>>>>>>>>> never
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > >>>>>>>>>>>>> operator a
> > > > >>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > >>>>>>>>>>>>> excited
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> see
> > > > >>>>>>>>>>>>>>>>>> how
> > > > >>>>>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>>>>>>>>>> parsing*:
> > > > >>>>>>>>>>>>> This
> > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>>>>>>>>>> parsing,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > >>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>> does
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>> think.
> > > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > >>>>>>>>>>>> correctly.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > >>>>>>>>>>>> Huang <
> > > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > >>>>>>>>>>>> collect
> > > > >>>>>>>>>>>>>>>>> feedback
> > > > >>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > >>>>>>>>>>>>>> previously
> > > > >>>>>>>>>>>>>>>>>> briefly
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > >>>>>>>>>>> done
> > > > >>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> Airflow
> > > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > > >>>>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > >>>>>>>>>>> attach
> > > > >>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>> back
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>> root
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > >>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>> related
> > > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > >>>>>>>>>> while
> > > > >>>>>>>>>>>>>>> respecting
> > > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > >>>>>>>>>> effect
> > > > >>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > >>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>> reusable
> > > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > >>>>>>>>>>>>>>> child_dag_name
> > > > >>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>>>>>>>>>> parsing*:
> > > > >>>>>>>>>>>>> This
> > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>>>>>>>>>> parsing,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > >>>>>>>>>> new
> > > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>> acts
> > > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > >>>>>>>>>>>>> methods
> > > > >>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>> removed.
> > > > >>>>>>>>>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > >>>>>>>>>> *with
> > > > >>>>>>>>>>>>>>>>> *subdag_args
> > > > >>>>>>>>>>>>>>>>>>> *and
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > >>>>>>>>>> PythonOperator
> > > > >>>>>>>>>>>>>>>> signature.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > >>>>>>>>>>>>>>> current_group
> > > > >>>>>>>>>>>>>>>> &
> > > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > >>>>>>>>>>> used
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > >>>>>>>>>>>>> further
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > >>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>> group-level
> > > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> dag)
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > >>>>>>>>>> Proposed
> > > > >>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>> modification
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > >>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>> structure
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>> pair
> > > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > >>>>>>>>>>>>> hierarchical
> > > > >>>>>>>>>>>>>>>>>>> structure.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > >>>>>>>>>> PRs
> > > > >>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>> details:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> https://github.com/apache/airflow/issues/8078
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > >>>>>>>>>>>>> aspects
> > > > >>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> third
> > > > >>>>>>>>>>>>>>>>>> change
> > > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > >>>>>>>>>>>> looking
> > > > >>>>>>>>>>>>>>>> forward
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> it!
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > > >> Software
> > > > >>>>>> Engineer
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > > >> <+48660796129
> > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Jarek Potiuk
> > > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > >>>>> Engineer
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> <+48660796129
> > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> --
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> *Jacob Ferriero*
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> jferriero@google.com
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> 617-714-2509
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
The vote for this AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>
passed. However, there's an interesting discussion going on here
<https://github.com/apache/airflow/pull/10153#discussion_r480247681>
regarding whether task_id should be automatically prefixed with group_id of
TaskGroup. So I'm bringing it up in this email thread for discussion.

Plan A: Prefix task_id with group_id of TaskGroup. This is the original
plan in AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>.
The task_id argument passed to an operator just needs to be unique across
the TaskGroup. The actual task_id is prefixed with the group_id so task_id
is guaranteed to be unique across the DAG.

Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
argument passed to the operator is the actual task_id. So the user is
forced to make sure task_id is unique across the whole DAG.

Obviously the convenience of Plan A is not free of charge. I’m summarizing
some of the pros and cons in this table. There are two examples at the
bottom illustrating the different usage. I was convinced by houqp on the
github comments and some of my own experiments that Plan B has more
advantages and avoids surprises. I'm going to update AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>
according to Plan B unless I hear strong objections before 20200903 7am UTC.




Plan A

Plan B

Ease of Use

Easier to use for new DAGs

Slightly more work on the user to maintain task_id uniqueness

Implementation

A little more complicated. Each group needs to know its parent’s group_id
in order to prefix the group_id correctly.

Implementation is simpler. No need to know the parent TaskGroup’s group_id.

Ease of Migration

task_id will change if TaskGroup is introduced into an existing DAG.
Existing tasks put into a TaskGroup will appear like new tasks if the DAG
already has some historical DagRun. This may pose a barrier to adoption of
TaskGroup.

No change in task_id when an existing task is put into a TaskGroup.
Migrating existing DAGs to adopt TaskGroup will be easier.

Actual task_id

Actual task_id tend to be longer because it’s always prefixed with
group_id, especially if the task is in a nested TaskGroup.

Actual task_id tend to be shorter because users control the actual task_id
themselves.

Graph label

Labels on Graph View tend to be shorter because task_id only needs to be
unique within the TaskGroup

Labels on Graph View tend to be longer because it displays the actual
task_id, which is a unique str across the DAG.


Plan A Example:

def create_section():

    dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]

    with TaskGroup("inside_section_1") as inside_section_1:

        _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]

    with TaskGroup("inside_section_2") as inside_section_2:

        _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]

    dummies[-1] >> inside_section_1

    dummies[-2] >> inside_section_2

    inside_section_1 >> inside_section_2


with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:

    start = DummyOperator(task_id="start")

    with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1:

        create_section()

    some_other_task = DummyOperator(task_id="some-other-task")

    with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2:

        create_section()

    end = DummyOperator(task_id='end')

    start >> section_1 >> some_other_task >> section_2 >> end


Plan B Example:

def create_section(section_num):

    dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
in range(5)]

    with TaskGroup(f"section_{section_num}.1") as inside_section_1:

        _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for i
in range(3)]

    with TaskGroup(f"section_{section_num}.2") as inside_section_2:

        _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for i
in range(3)]

    dummies[-1] >> inside_section_1

    dummies[-2] >> inside_section_2

    inside_section_1 >> inside_section_2


with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:

    start = DummyOperator(task_id="start")

    with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1:

        create_section(1)

    some_other_task = DummyOperator(task_id="some-other-task")

    with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2:

        create_section(2)

    end = DummyOperator(task_id='end')

    start >> section_1 >> some_other_task >> section_2 >> end


On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
<gc...@twitter.com.invalid> wrote:

> Agree on this being non-blocking.
>
> Regarding moving to vote, you can take care. Just open a new email thread
> on dev list and call for a vote. You can see this example from Tomek for
> AIP-31:
>
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
>
> Best,
>
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
>
> > Hi, Gerard, yes I agree it's possible to do this at UI level without any
> > fundamental change to the implementation. If expand_group() sees that two
> > groups are fully connected (i.e. every task in one parent group depends
> on
> > every task in another parent group), it can decide to collapse all those
> > children edges into a single edge between the parent groups to reduce the
> > burden of the layout() function. However, I did not find any existing
> > algorithm to do this within dagre so we'll likely need to implement this
> > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > possible to call setEdge() between two parent groups (aka clusters). If
> > someone has ideas how to do this please feel free to contribute.
> >
> > One other consideration is that this example is only an extreme case.
> There
> > are other in-between cases that still require user intervention. Let's
> say
> > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> groups
> > have more than 100 tasks. This will still cause a lot of edges on the
> graph
> > and it's even harder to reduce because the parent groups are not fully
> > connected so it's inaccurate to reduce them to a single edge between the
> > parents. In those cases, the user may still need to do something
> > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > edges. There will be some tradeoff because DummyOperator takes a short
> > while to execute like you mentioned.
> >
> > There are lots of room for improvements, but I don't think that's a
> > blocking issue for this AIP? So if you can move it to the voting stage
> > that'll be fantastic.
> >
> >
> > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > wrote:
> >
> > > +1
> > >
> > > > 2020年8月18日 23:55,Gerard Casas Saez <gc...@twitter.com.INVALID>
> > 写道:
> > > >
> > > > Is it not possible to solve this at the UI level? Aka tell dagre to
> > only
> > > > add 1 edge to the group instead of to all nodes in the group? No need
> > to
> > > do
> > > > SubDag behaviour, but just reduce the edges on the graph. Should
> reduce
> > > > load time if I understand correctly.
> > > >
> > > > I would strongly avoid the Dummy operator since it will introduce
> > delays
> > > on
> > > > operator execution (as it will need to execute 1 dummy operator and
> > that
> > > > can be expensive imo).
> > > >
> > > > Overall though proposal looks good, unless anyone opposes it, I would
> > > move
> > > > this to vote mode :D
> > > >
> > > > Gerard Casas Saez
> > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >
> > > >
> > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> wrote:
> > > >
> > > >> Hi, All,
> > > >> Here's the updated AIP-34
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > >>> .
> > > >> The PR has been fine-tuned with better UI interactions and added
> > > >> serialization of TaskGroup:
> > > https://github.com/apache/airflow/pull/10153
> > > >>
> > > >> Here's some experiment results:
> > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> this.
> > > Note
> > > >> there's a inside_section_2 is intentionally made to depend on all
> > tasks
> > > >> in inside_section_1 to generate a large number of edges. The
> > > observation is
> > > >> that opening the top level graph is very quick, around 270ms.
> > Expanding
> > > >> groups that don't have a lot of dense dependencies on other groups
> are
> > > also
> > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> that
> > > takes
> > > >> time is when expanding both groups inside_section_1 and
> > inside_section_2
> > > >> Because there are 2500 edges between these two inner groups, it took
> > 63
> > > >> seconds to expand both of them. Majority of the time (more than
> > > 62seconds)
> > > >> is actually taken by the layout() function in dagre. In other words,
> > > it's
> > > >> very fast to add nodes and edges, but laying them out on the graph
> > takes
> > > >> time. This issue is not actually a problem specific to TaskGroup.
> > > Without
> > > >> TaskGroup, if a DAG contains too many edges, it takes time to layout
> > the
> > > >> graph too.
> > > >>
> > > >> On the other hand, a more realistic experiment with production DAG
> > > >> containing about 400 tasks and 700 edges showed that grouping tasks
> > into
> > > >> three levels of nested TaskGroup cut the upfront page opening time
> > from
> > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > gradually
> > > >> expands all the groups one by one, but normally people don't need to
> > > expand
> > > >> every group every time so it's still a big saving). The experiments
> > are
> > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > > >>
> > > >> I can see a few possible improvements to TaskGroup (or how it's
> used)
> > > that
> > > >> can be done as a next-step:
> > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> > > >> displaying the whole DAG, we can limit the Graph View to show only a
> > > single
> > > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > > behaviour
> > > >> is more like SubDagOperator where users can zoom into/out of a
> > TaskGroup
> > > >> and look at only tasks within that TaskGroup as if those are the
> only
> > > tasks
> > > >> on the DAG. This can be done with either background javascript calls
> > or
> > > by
> > > >> making a new get request with filtering parameters. Obviously the
> > > downside
> > > >> is that it's not as explicit as showing all the dependencies on the
> > > graph.
> > > >> 2). Users can improve the organization of the DAG themselves to
> reduce
> > > the
> > > >> number of edges. E.g. if every task in group2 depends on every tasks
> > in
> > > >> group1, instead of doing group1 >> group2, they can add a
> > DummyOperator
> > > in
> > > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > > number
> > > >> of edges significantly and page load becomes much faster.
> > > >> 3). If we really want, we can improve the >> operator of TaskGroup
> to
> > > do 2)
> > > >> automatically. If it sees that both sides of >> are TaskGroup, it
> can
> > > >> create a DummyOperator on behalf of the user. The downside is that
> it
> > > may
> > > >> be too much magic.
> > > >>
> > > >> Thanks,
> > > >> Qian
> > > >>
> > > >> def create_section():
> > > >> """
> > > >> Create tasks in the outer section.
> > > >> """
> > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> range(100)]
> > > >>
> > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > >>
> > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > >>
> > > >> dummies[-1] >> inside_section_1
> > > >> dummies[-2] >> inside_section_2
> > > >> inside_section_1 >> inside_section_2
> > > >>
> > > >>
> > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> dag:
> > > >> start = DummyOperator(task_id="start")
> > > >>
> > > >> with TaskGroup("section_1") as section_1:
> > > >> create_section()
> > > >>
> > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > >>
> > > >> with TaskGroup("section_2") as section_2:
> > > >> create_section()
> > > >>
> > > >> end = DummyOperator(task_id='end')
> > > >>
> > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > >>
> > > >>
> > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > >> <gc...@twitter.com.invalid> wrote:
> > > >>
> > > >>> Re graph times. That makes sense. Let me know what you find. We may
> > be
> > > >> able
> > > >>> to contribute on the lazy loading part.
> > > >>>
> > > >>> Looking forward to see the updated AIP!
> > > >>>
> > > >>>
> > > >>> Gerard Casas Saez
> > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >>>
> > > >>>
> > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Permissions granted, let me know if you face any issues.
> > > >>>>
> > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > > >>>>>
> > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> please
> > > >>>> create
> > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and
> send
> > > >> us
> > > >>>>> your
> > > >>>>>> ID and we will add permissions.
> > > >>>>>>
> > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
> > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> > > >>> wrote:
> > > >>>>>>
> > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> > > >> to
> > > >>>> edit
> > > >>>>>> it?
> > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > >>>>>>>
> > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
> > > >>> server
> > > >>>>> at
> > > >>>>>>> once. However, it only adds the top level nodes and edges to
> the
> > > >>>> graph
> > > >>>>>> when
> > > >>>>>>> the Graph View page is first opened. And then adds the expanded
> > > >>> nodes
> > > >>>>> to
> > > >>>>>>> the graph as the user expands them. From what I've experienced
> > > >> with
> > > >>>>> DAGs
> > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > >>> SubDagOperator),
> > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds.
> Less
> > > >>>> than
> > > >>>>>> 60ms
> > > >>>>>>> of that is taken by loading the data from webserver. The
> > > >> remaining
> > > >>>>> 4.9s+
> > > >>>>>> is
> > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > >>> createNodes,
> > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> TaskGroup
> > > >>>> being
> > > >>>>>> used
> > > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > > >> amount
> > > >>>> of
> > > >>>>>> data
> > > >>>>>>> loaded from webserver will remain about the same compared to a
> > > >> flat
> > > >>>> dag
> > > >>>>>> of
> > > >>>>>>> the same size, but the number of nodes and edges needed to be
> > > >> plot
> > > >>> on
> > > >>>>> the
> > > >>>>>>> graph can be reduced significantly. So in theory this should
> > > >> speed
> > > >>> up
> > > >>>>> the
> > > >>>>>>> time it takes to open Graph View even without lazy-loading the
> > > >> data
> > > >>>>> (I'll
> > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > >>>> lazy-loading
> > > >>>>>>> helps, we can still implement it as an improvement.
> > > >>>>>>>
> > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> fully
> > > >>>>>> expanded.
> > > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm
> > > >>> less
> > > >>>>>>> worried about Tree View at the moment because it already has a
> > > >>>>> mechanism
> > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> Tree
> > > >>> View
> > > >>>>> can
> > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
> > > >> in
> > > >>>> the
> > > >>>>>> same
> > > >>>>>>> TaskGroup when Tree View is first opened).
> > > >>>>>>>
> > > >>>>>>> For both suggestions, implementing them don't require
> fundamental
> > > >>>>> changes
> > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> first,
> > > >>> and
> > > >>>>>> then
> > > >>>>>>> improve it incrementally in several PRs as we get more feedback
> > > >>> from
> > > >>>>> the
> > > >>>>>>> community. What do you think?
> > > >>>>>>>
> > > >>>>>>> Qian
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> jcoder01@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> I agree this looks great, one question, how does the tree view
> > > >>>> look?
> > > >>>>>>>>
> > > >>>>>>>> James Coder
> > > >>>>>>>>
> > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > >>>>>> gcasassaez@twitter.com
> > > >>>>>>> .invalid>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> First of all, this is awesome!!
> > > >>>>>>>>>
> > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > > >>>>> operators
> > > >>>>>> at
> > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > >>> whenever
> > > >>>> we
> > > >>>>>>> click
> > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > >> forever
> > > >>>> to
> > > >>>>>> load
> > > >>>>>>>> on
> > > >>>>>>>>> the Graph view, so worried about this still being an issue
> > > >>> here.
> > > >>>> It
> > > >>>>>> may
> > > >>>>>>>> be
> > > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > > >> Not
> > > >>>> sure
> > > >>>>>> how
> > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > > >>> push
> > > >>>>> for
> > > >>>>>>>> early
> > > >>>>>>>>> optimization as its the root of all evil).
> > > >>>>>>>>> Gerard Casas Saez
> > > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > >>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hi Yu,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > >> distracted
> > > >>>>>>> previously
> > > >>>>>>>>>> and I didn't have the time to update the proposal. In fact,
> > > >>>> after
> > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
> > > >>> AIP
> > > >>>>> has
> > > >>>>>>>> been
> > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > >> rewriting
> > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
> > > >>> in a
> > > >>>>>>> future
> > > >>>>>>>>>> date.).
> > > >>>>>>>>>>
> > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > >>> features. I
> > > >>>>>> think
> > > >>>>>>>> we
> > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > > >> AIP
> > > >>>>> based
> > > >>>>>>> on
> > > >>>>>>>>>> what you have done in your PR?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Bin
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > >>> yuqian1990@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > >>>>>>> implementation
> > > >>>>>>>> of
> > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
> > > >> so
> > > >>>> i'm
> > > >>>>>>>> quoting
> > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > >>>>> restriction
> > > >>>>>>>>>>> "... **cannot*
> > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
> > > >> a*
> > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > >>>> group*". I
> > > >>>>>>> think
> > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
> > > >>>> tasks
> > > >>>>>> can
> > > >>>>>>>> have
> > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > >>>> TaskGroup.
> > > >>>>>> In
> > > >>>>>>> my
> > > >>>>>>>>>> PR,
> > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > > >> when
> > > >>>>>>>> TaskGroups
> > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > > >> the
> > > >>>> UI
> > > >>>>>> look
> > > >>>>>>>>>> less
> > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
> > > >>> and
> > > >>>>>> edges
> > > >>>>>>>> so
> > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> of the UI interaction.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > >>>>>>> dependencies
> > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > > >>>>>> dependencies
> > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > >>>> different
> > > >>>>>>>>>> TaskGroup
> > > >>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
> > > >>>>> between
> > > >>>>>> a
> > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
> > > >>>> group
> > > >>>>>> -
> > > >>>>>>>> The
> > > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
> > > >>> but
> > > >>>>>>> which
> > > >>>>>>>>>> you
> > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
> > > >>>>>> determine
> > > >>>>>>>> what
> > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > > >>>>> purposes*
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > >> implement
> > > >>>> the
> > > >>>>>>>>>> "retrying
> > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > >> feature
> > > >>>> of
> > > >>>>>>>>>> TaskGroup
> > > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI
> > > >>>>> concept.
> > > >>>>>>> For
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > > >> both
> > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > >> single
> > > >>>>>>> operator.
> > > >>>>>>>> It
> > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
> > > >>>>>>> "reschedule"
> > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > > >>>> running
> > > >>>>>> job
> > > >>>>>>>> to
> > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > > >>> Then
> > > >>>>>>>> reschedule
> > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > >> state.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > >>>>>>>>>> <jferriero@google.com.invalid
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > >> think
> > > >>>> this
> > > >>>>>>> will
> > > >>>>>>>>>> be
> > > >>>>>>>>>>>> much easier to use than SubDag.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry
> > > >>>>>> mechanics
> > > >>>>>>>>>> via
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
> > > >> of
> > > >>>>>> SubDag
> > > >>>>>>>> for
> > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
> > > >>>> state
> > > >>>>>> then
> > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
> > > >>>> task
> > > >>>>>>> group
> > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > > >> the
> > > >>>>>>>>>>> SubmitJobTask
> > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > >> expected
> > > >>>> to
> > > >>>>>> run
> > > >>>>>>> a
> > > >>>>>>>>>>> long
> > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > > >>>> freeing
> > > >>>>> up
> > > >>>>>>>>>> slots)
> > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > > >>>> purpose
> > > >>>>>>>> because
> > > >>>>>>>>>>>> SubDag infamously
> > > >>>>>>>>>>>> <
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > >> very
> > > >>>>> common
> > > >>>>>>> for
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > > >>>>> Dataproc,
> > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > >>>> PollTask]
> > > >>>>>>> with
> > > >>>>>>>>>> an
> > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > > >>> take
> > > >>>> a
> > > >>>>>> long
> > > >>>>>>>>>>> time.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > >> specific
> > > >>>>> retry
> > > >>>>>>>>>>> behavior
> > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > >> others
> > > >>> in
> > > >>>>> the
> > > >>>>>>>>>>>> community would find this a useful feature.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>> Jake
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > >>> regular
> > > >>>>>>>>>> planning
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task
> > > >>>> force
> > > >>>>>> for
> > > >>>>>>> it
> > > >>>>>>>>>>>> soon,
> > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > > >>> even
> > > >>>>>> start
> > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > > >> we
> > > >>>> are
> > > >>>>>>>>>>>> prioritizing
> > > >>>>>>>>>>>>> 2.0 work.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> J,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > >>>> yuqian1990@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi Jarek,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > >> existing
> > > >>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > > >>>> about
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > >> existing
> > > >>>>>>>>>>>> SubDagOperator?
> > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
> > > >>>>> concept
> > > >>>>>>>>>> like
> > > >>>>>>>>>>>> Ash
> > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > >> Whenever
> > > >>> we
> > > >>>>> are
> > > >>>>>>>>>>> ready
> > > >>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > > >>> 2.1.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > >> SubDagOperator
> > > >>>>> idea
> > > >>>>>>>>>> into
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > >>>>>> "reattaching
> > > >>>>>>>>>> all
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
> > > >>>>> pointed
> > > >>>>>>>>>> out
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > >>> setting
> > > >>>> of
> > > >>>>>>>>>>>> TaskGroup.
> > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > > >>>> having
> > > >>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
> > > >>>>>> simplify
> > > >>>>>>>>>>>>> Xinbin's
> > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
> > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
> > > >>>> with
> > > >>>>>> web
> > > >>>>>>>>>> UI.
> > > >>>>>>>>>>>> If
> > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Qian
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
> > > >>> up.
> > > >>>>>> Maybe
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > > >>> about
> > > >>>>>>>>>> further
> > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > >>> discussions
> > > >>>>> (and
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > > >>> point
> > > >>>>>>>>>> focusing
> > > >>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
> > > >>>> now ?
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> J.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Daniel
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > > >> as a
> > > >>>> DAG
> > > >>>>>>>>>>> object
> > > >>>>>>>>>>>>>>> related
> > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > > >>>> related
> > > >>>>> to
> > > >>>>>>>>>>>> actual
> > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > >>> weekend.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > > >> you
> > > >>>>>>>>>> import
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > >> shape
> > > >>>> of
> > > >>>>>> the
> > > >>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > > >>>>> similar
> > > >>>>>>>>>>>> purpose
> > > >>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> DAG factory function?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > >> object
> > > >>>>> (e.g.
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> bitwise
> > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
> > > >>>>>>>>>>>> “DAGTemplate”
> > > >>>>>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > >> with
> > > >>>>>>>>>>> parameters
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > > >>>>> parameter
> > > >>>>>>>>>>>>> itself,
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > >> opinion,
> > > >>>> the
> > > >>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
> > > >>> and
> > > >>>>> the
> > > >>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > >>>>>>>>>>> execution/scheduling
> > > >>>>>>>>>>>>>> logic
> > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > > >>>> etc.)
> > > >>>>>>>>>>> like
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>> does.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
> > > >>>>>>>>>> interval
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > > >>> min.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > >> that
> > > >>>> you
> > > >>>>>>>>>> want
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> achieve?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > >> TaskGroup
> > > >>>> the
> > > >>>>>>>>>>> same
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > >>>>>>>>>> interval
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > >> example,
> > > >>>>> there
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> scenario
> > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > >>>>>>>>>> schedule
> > > >>>>>>>>>>>>>> interval
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>> Nicholas
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > >>> SubDagOperator,
> > > >>>>>>>>>>> maybe
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>> throw
> > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > >> subdag's
> > > >>>>>>>>>>>>>>>> schedule_interval
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > >>> replace
> > > >>>>>>>>>>>> SubDag,
> > > >>>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > >>> whether
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>>> schedule
> > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > > >>>> parent
> > > >>>>>>>>>>>> DAG?
> > > >>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > >>>> interval
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> SubDAG.
> > > >>>>>>>>>>>>>>>> If
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > > >>> what
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>>>>> happen
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
> > > >>> task
> > > >>>>>>>>>>>>>> groups. I
> > > >>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
> > > >>>>>>>>>>> subdag
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> introduce
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > >> tasks
> > > >>>>>>>>>>> along
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> their
> > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > > >>> as a
> > > >>>>>>>>>>>> DAG*.
> > > >>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > > >> you
> > > >>>>>>>>>>> still
> > > >>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
> > > >>> args
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > >>>>>>>>>>>>>>>>>>>>> pass
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > >> adding
> > > >>>>>>>>>>> tasks
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
> > > >>> dag
> > > >>>>>>>>>>> file
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > >>>>>>>>>>>>>>> default_args=default_args,
> > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > >>>>>>>>>>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > > >> and
> > > >>>>>>>>>> set
> > > >>>>>>>>>>>>>>>> dependencies
> > > >>>>>>>>>>>>>>>>>>>> between
> > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > >>>>>>>>>>>>>> SubDagOperator,
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > > >>> task`.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > > >>>>>>>>>> Airflow
> > > >>>>>>>>>>>> 2.0
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > >> still
> > > >>>>>>>>>> want
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> keep
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > >> Beauchemin <
> > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
> > > >>>>>>>>>>> groups
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > > >>> DAG
> > > >>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>> since
> > > >>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > >>>>>>>>>>> create
> > > >>>>>>>>>>>>>>>> underlying
> > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > >>>>>>>>>> group
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> tasks.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > >>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > >>>>>>>>>> does
> > > >>>>>>>>>>>>> this I
> > > >>>>>>>>>>>>>>>>> think.
> > > >>>>>>>>>>>>>>>>>> At
> > > >>>>>>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > >>>>>>>>>>> representation,
> > > >>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> at
> > > >>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > >>>>>>>>>> In
> > > >>>>>>>>>>> my
> > > >>>>>>>>>>>>>>>> proposal
> > > >>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > >>>>>>>>>> from
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > >> graph
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>>>> look
> > > >>>>>>>>>>>>>>>>>> exactly
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > >>>>>>>>>> attached
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> those
> > > >>>>>>>>>>>>>>>>>>>> sections.
> > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > > >> the
> > > >>>>>>>>>>> UI.
> > > >>>>>>>>>>>>> So
> > > >>>>>>>>>>>>>>>> after
> > > >>>>>>>>>>>>>>>>>>>> parsing
> > > >>>>>>>>>>>>>>>>>>>>> (
> > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>>> *root_dag
> > > >>>>>>>>>>>>>>>>>>>> *instead
> > > >>>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > >>>>>>>>>>> naming
> > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > >>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > >>>>>>>>>>> nested
> > > >>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > >> something
> > > >>>>>>>>>>>> like
> > > >>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>> way.
> > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > >>>>>>>>>>> complexity
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> SubDag
> > > >>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > >>>>>>>>>> using
> > > >>>>>>>>>>>>>> SubDag.
> > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > >>>>>>>>>>>>> reusable
> > > >>>>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>>> code
> > > >>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > > >> the
> > > >>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>>> (see
> > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > >>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > >>>>>>>>>>> (in
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> case,
> > > >>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > >>>>>>>>>>> root
> > > >>>>>>>>>>>>>> dag).
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > >>>>>>>>>>>> with a
> > > >>>>>>>>>>>>>>>>>> simpler
> > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > >>>>>>>>>> out
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> contents
> > > >>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > > >> is
> > > >>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>> necessary
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > >>>>>>>>>>>> name?
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > >>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>> Palmer
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>> helping
> > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> > > >> I
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>> paste
> > > >>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> same
> > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> any
> > > >>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >>>>>>>>>> as
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>>> "object",
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>>>>>>>>>>> "status"
> > > >>>>>>>>>>>>>>> of a
> > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > >>>>>>>>>>> executor), I
> > > >>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>>> implement
> > > >>>>>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > >>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> etc.)
> > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > >> pick
> > > >>>>>>>>>>> up
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> individual
> > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > >>>>>>>>>> status
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > >>>>>>>>>> Imberman
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > >> operator
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> tie
> > > >>>>>>>>>>>>>>> dags
> > > >>>>>>>>>>>>>>>>>>>> together
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > > >> we
> > > >>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> essentially
> > > >>>>>>>>>>>>>>>>>>>>> write
> > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > >>>>>>>>>>>> starter-tasks
> > > >>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>>>> concept.
> > > >>>>>>>>>>>>>>>> It
> > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > >>>>>>>>>>>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > >> more
> > > >>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> queue
> > > >>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > >>>>>>>>>> available.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > >>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > >>>>>>>>>>>>>> abstraction.
> > > >>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > >>>>>>>>>> high
> > > >>>>>>>>>>>>> level
> > > >>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > >> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> same
> > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > >> a
> > > >>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > >> in
> > > >>>>>>>>>>> any
> > > >>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >>>>>>>>>> as a
> > > >>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>> "object",
> > > >>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>>>>>>>>>> "status"
> > > >>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > >>>>>>>>>>> object
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>> its
> > > >>>>>>>>>>>>>>>>>> own
> > > >>>>>>>>>>>>>>>>>>>>>> database
> > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > >>>>>>>>>>>> tasks.
> > > >>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > >>>>>>>>>> point
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>> view
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > >>>>>>>>>> differently.
> > > >>>>>>>>>>> So
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> really
> > > >>>>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > >> sets
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>> Tasks,
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> allows
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>> important
> > > >>>>>>>>>>>>>>>>>>>> issue
> > > >>>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > >>>>>>>>>>> right
> > > >>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>>> forward
> > > >>>>>>>>>>>>>>>>>>>> (just
> > > >>>>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > >>>>>>>>>>> adding
> > > >>>>>>>>>>>>>>> visual
> > > >>>>>>>>>>>>>>>>>>> grouping
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>> context
> > > >>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>> why
> > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>> e.g.
> > > >>>>>>>>>>>>>>>>> enabling
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > >>>>>>>>>>>> well. I
> > > >>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> being
> > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> one
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> two
> > > >>>>>>>>>>>>>>>>>>>>>> items
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > >>>>>>>>>>>>>> functionality.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>> they
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> always a
> > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > >>>>>>>>>>>>> confusion
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> breakages
> > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > >>>>>>>>>> Coder <
> > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>>>>> concept. I
> > > >>>>>>>>>>>>>>>>> use
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > >>>>>>>>>>> you
> > > >>>>>>>>>>>>>> have a
> > > >>>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > >>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> start,
> > > >>>>>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > >>>>>>>>>>>> and I
> > > >>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > >>>>>>>>>> Hamlin
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > >>>>>>>>>>>>>> Berlin-Taylor
> > > >>>>>>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > >>>>>>>>>>>> anymore?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > >>>>>>>>>>>>> replacing
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> get
> > > >>>>>>>>>>>>>>>> wrong,
> > > >>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>> closer
> > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > >>>>>>>>>>>> subdags?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > >>>>>>>>>>>> subdags
> > > >>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> start
> > > >>>>>>>>>>>>>>>>>>>>>> running
> > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> something
> > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > >>>>>>>>>>> haven't
> > > >>>>>>>>>>>>> used
> > > >>>>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > >>>>>>>>>>>> has(?)
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> form
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > >>>>>>>>>> schedule_interval,
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> has
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> match
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > >>>>>>>>>>>> (Does
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>>>> sense
> > > >>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > >>>>>>>>>>> sub
> > > >>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>> never
> > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > >>>>>>>>>>>>> operator a
> > > >>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > >>>>>>>>>>>>> excited
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>>>>>> how
> > > >>>>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>>>>>>>>>> parsing*:
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>>>>>>>>>> parsing,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > >>>>>>>>>>>> already
> > > >>>>>>>>>>>>>> does
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>> think.
> > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > >>>>>>>>>>>> correctly.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > >>>>>>>>>>>> Huang <
> > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > >>>>>>>>>>>> collect
> > > >>>>>>>>>>>>>>>>> feedback
> > > >>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > >>>>>>>>>>>>>> previously
> > > >>>>>>>>>>>>>>>>>> briefly
> > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > >>>>>>>>>>> done
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> Airflow
> > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > >>>>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > >>>>>>>>>>> attach
> > > >>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>> back
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> root
> > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > >>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>> related
> > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > >>>>>>>>>> while
> > > >>>>>>>>>>>>>>> respecting
> > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > >>>>>>>>>> effect
> > > >>>>>>>>>>>> on
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > >>>>>>>>>>>> function
> > > >>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>> reusable
> > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > >>>>>>>>>>>>>>> child_dag_name
> > > >>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>>>>>>>>>> parsing*:
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>>>>>>>>>> parsing,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > >>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>> acts
> > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > >>>>>>>>>>>>> methods
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> removed.
> > > >>>>>>>>>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > >>>>>>>>>> *with
> > > >>>>>>>>>>>>>>>>> *subdag_args
> > > >>>>>>>>>>>>>>>>>>> *and
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > >>>>>>>>>> PythonOperator
> > > >>>>>>>>>>>>>>>> signature.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > >>>>>>>>>>>>>>> current_group
> > > >>>>>>>>>>>>>>>> &
> > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > >>>>>>>>>>> used
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > >>>>>>>>>>>>> further
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > >>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>> group-level
> > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> dag)
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > >>>>>>>>>> Proposed
> > > >>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>> modification
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > >>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>> structure
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> pair
> > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > >>>>>>>>>>>>> hierarchical
> > > >>>>>>>>>>>>>>>>>>> structure.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > >>>>>>>>>> PRs
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> details:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > >>>>>>>>>>>>> aspects
> > > >>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> third
> > > >>>>>>>>>>>>>>>>>> change
> > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > >>>>>>>>>>>> looking
> > > >>>>>>>>>>>>>>>> forward
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> it!
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > >> Software
> > > >>>>>> Engineer
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > >> <+48660796129
> > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Jarek Potiuk
> > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > >>>>> Engineer
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> --
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> *Jacob Ferriero*
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> jferriero@google.com
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 617-714-2509
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
Agree on this being non-blocking.

Regarding moving to vote, you can take care. Just open a new email thread
on dev list and call for a vote. You can see this example from Tomek for
AIP-31:
https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E

Best,


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:

> Hi, Gerard, yes I agree it's possible to do this at UI level without any
> fundamental change to the implementation. If expand_group() sees that two
> groups are fully connected (i.e. every task in one parent group depends on
> every task in another parent group), it can decide to collapse all those
> children edges into a single edge between the parent groups to reduce the
> burden of the layout() function. However, I did not find any existing
> algorithm to do this within dagre so we'll likely need to implement this
> ourselves. Another hiccup is that at the moment it doesn't seem to be
> possible to call setEdge() between two parent groups (aka clusters). If
> someone has ideas how to do this please feel free to contribute.
>
> One other consideration is that this example is only an extreme case. There
> are other in-between cases that still require user intervention. Let's say
> if 90% of tasks in group1 depends on 90% of tasks in group2 and both groups
> have more than 100 tasks. This will still cause a lot of edges on the graph
> and it's even harder to reduce because the parent groups are not fully
> connected so it's inaccurate to reduce them to a single edge between the
> parents. In those cases, the user may still need to do something
> themselves. e.g. adding some DummyOperator to the DAG to cut down the
> edges. There will be some tradeoff because DummyOperator takes a short
> while to execute like you mentioned.
>
> There are lots of room for improvements, but I don't think that's a
> blocking issue for this AIP? So if you can move it to the voting stage
> that'll be fantastic.
>
>
> On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> wrote:
>
> > +1
> >
> > > 2020年8月18日 23:55,Gerard Casas Saez <gc...@twitter.com.INVALID>
> 写道:
> > >
> > > Is it not possible to solve this at the UI level? Aka tell dagre to
> only
> > > add 1 edge to the group instead of to all nodes in the group? No need
> to
> > do
> > > SubDag behaviour, but just reduce the edges on the graph. Should reduce
> > > load time if I understand correctly.
> > >
> > > I would strongly avoid the Dummy operator since it will introduce
> delays
> > on
> > > operator execution (as it will need to execute 1 dummy operator and
> that
> > > can be expensive imo).
> > >
> > > Overall though proposal looks good, unless anyone opposes it, I would
> > move
> > > this to vote mode :D
> > >
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com> wrote:
> > >
> > >> Hi, All,
> > >> Here's the updated AIP-34
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >>> .
> > >> The PR has been fine-tuned with better UI interactions and added
> > >> serialization of TaskGroup:
> > https://github.com/apache/airflow/pull/10153
> > >>
> > >> Here's some experiment results:
> > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like this.
> > Note
> > >> there's a inside_section_2 is intentionally made to depend on all
> tasks
> > >> in inside_section_1 to generate a large number of edges. The
> > observation is
> > >> that opening the top level graph is very quick, around 270ms.
> Expanding
> > >> groups that don't have a lot of dense dependencies on other groups are
> > also
> > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part that
> > takes
> > >> time is when expanding both groups inside_section_1 and
> inside_section_2
> > >> Because there are 2500 edges between these two inner groups, it took
> 63
> > >> seconds to expand both of them. Majority of the time (more than
> > 62seconds)
> > >> is actually taken by the layout() function in dagre. In other words,
> > it's
> > >> very fast to add nodes and edges, but laying them out on the graph
> takes
> > >> time. This issue is not actually a problem specific to TaskGroup.
> > Without
> > >> TaskGroup, if a DAG contains too many edges, it takes time to layout
> the
> > >> graph too.
> > >>
> > >> On the other hand, a more realistic experiment with production DAG
> > >> containing about 400 tasks and 700 edges showed that grouping tasks
> into
> > >> three levels of nested TaskGroup cut the upfront page opening time
> from
> > >> around 6s to 500ms. (Obviously the time is paid back when user
> gradually
> > >> expands all the groups one by one, but normally people don't need to
> > expand
> > >> every group every time so it's still a big saving). The experiments
> are
> > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > >>
> > >> I can see a few possible improvements to TaskGroup (or how it's used)
> > that
> > >> can be done as a next-step:
> > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> > >> displaying the whole DAG, we can limit the Graph View to show only a
> > single
> > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > behaviour
> > >> is more like SubDagOperator where users can zoom into/out of a
> TaskGroup
> > >> and look at only tasks within that TaskGroup as if those are the only
> > tasks
> > >> on the DAG. This can be done with either background javascript calls
> or
> > by
> > >> making a new get request with filtering parameters. Obviously the
> > downside
> > >> is that it's not as explicit as showing all the dependencies on the
> > graph.
> > >> 2). Users can improve the organization of the DAG themselves to reduce
> > the
> > >> number of edges. E.g. if every task in group2 depends on every tasks
> in
> > >> group1, instead of doing group1 >> group2, they can add a
> DummyOperator
> > in
> > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > number
> > >> of edges significantly and page load becomes much faster.
> > >> 3). If we really want, we can improve the >> operator of TaskGroup to
> > do 2)
> > >> automatically. If it sees that both sides of >> are TaskGroup, it can
> > >> create a DummyOperator on behalf of the user. The downside is that it
> > may
> > >> be too much magic.
> > >>
> > >> Thanks,
> > >> Qian
> > >>
> > >> def create_section():
> > >> """
> > >> Create tasks in the outer section.
> > >> """
> > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
> > >>
> > >> with TaskGroup("inside_section_1") as inside_section_1:
> > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > >>
> > >> with TaskGroup("inside_section_2") as inside_section_2:
> > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > >>
> > >> dummies[-1] >> inside_section_1
> > >> dummies[-2] >> inside_section_2
> > >> inside_section_1 >> inside_section_2
> > >>
> > >>
> > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> > >> start = DummyOperator(task_id="start")
> > >>
> > >> with TaskGroup("section_1") as section_1:
> > >> create_section()
> > >>
> > >> some_other_task = DummyOperator(task_id="some-other-task")
> > >>
> > >> with TaskGroup("section_2") as section_2:
> > >> create_section()
> > >>
> > >> end = DummyOperator(task_id='end')
> > >>
> > >> start >> section_1 >> some_other_task >> section_2 >> end
> > >>
> > >>
> > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > >> <gc...@twitter.com.invalid> wrote:
> > >>
> > >>> Re graph times. That makes sense. Let me know what you find. We may
> be
> > >> able
> > >>> to contribute on the lazy loading part.
> > >>>
> > >>> Looking forward to see the updated AIP!
> > >>>
> > >>>
> > >>> Gerard Casas Saez
> > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >>>
> > >>>
> > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > >>>
> > >>>> Permissions granted, let me know if you face any issues.
> > >>>>
> > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com>
> wrote:
> > >>>>
> > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > >>>>>
> > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>>> What's your ID i.e. if you haven't created an account yet, please
> > >>>> create
> > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and send
> > >> us
> > >>>>> your
> > >>>>>> ID and we will add permissions.
> > >>>>>>
> > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
> > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> > >> to
> > >>>> edit
> > >>>>>> it?
> > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > >>>>>>>
> > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
> > >>> server
> > >>>>> at
> > >>>>>>> once. However, it only adds the top level nodes and edges to the
> > >>>> graph
> > >>>>>> when
> > >>>>>>> the Graph View page is first opened. And then adds the expanded
> > >>> nodes
> > >>>>> to
> > >>>>>>> the graph as the user expands them. From what I've experienced
> > >> with
> > >>>>> DAGs
> > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > >>> SubDagOperator),
> > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds. Less
> > >>>> than
> > >>>>>> 60ms
> > >>>>>>> of that is taken by loading the data from webserver. The
> > >> remaining
> > >>>>> 4.9s+
> > >>>>>> is
> > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > >>> createNodes,
> > >>>>>>> createEdgeLabels, etc and by rendering the graph. With TaskGroup
> > >>>> being
> > >>>>>> used
> > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > >> amount
> > >>>> of
> > >>>>>> data
> > >>>>>>> loaded from webserver will remain about the same compared to a
> > >> flat
> > >>>> dag
> > >>>>>> of
> > >>>>>>> the same size, but the number of nodes and edges needed to be
> > >> plot
> > >>> on
> > >>>>> the
> > >>>>>>> graph can be reduced significantly. So in theory this should
> > >> speed
> > >>> up
> > >>>>> the
> > >>>>>>> time it takes to open Graph View even without lazy-loading the
> > >> data
> > >>>>> (I'll
> > >>>>>>> experiment to find out). That said, if it comes to a point
> > >>>> lazy-loading
> > >>>>>>> helps, we can still implement it as an improvement.
> > >>>>>>>
> > >>>>>>> Re James: the Tree View looks as if all all the groups are fully
> > >>>>>> expanded.
> > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm
> > >>> less
> > >>>>>>> worried about Tree View at the moment because it already has a
> > >>>>> mechanism
> > >>>>>>> for collapsing tasks by the dependency tree. That said, the Tree
> > >>> View
> > >>>>> can
> > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
> > >> in
> > >>>> the
> > >>>>>> same
> > >>>>>>> TaskGroup when Tree View is first opened).
> > >>>>>>>
> > >>>>>>> For both suggestions, implementing them don't require fundamental
> > >>>>> changes
> > >>>>>>> to the idea. I think we can have a basic working TaskGroup first,
> > >>> and
> > >>>>>> then
> > >>>>>>> improve it incrementally in several PRs as we get more feedback
> > >>> from
> > >>>>> the
> > >>>>>>> community. What do you think?
> > >>>>>>>
> > >>>>>>> Qian
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> > >>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I agree this looks great, one question, how does the tree view
> > >>>> look?
> > >>>>>>>>
> > >>>>>>>> James Coder
> > >>>>>>>>
> > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > >>>>>> gcasassaez@twitter.com
> > >>>>>>> .invalid>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> First of all, this is awesome!!
> > >>>>>>>>>
> > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > >>>>> operators
> > >>>>>> at
> > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > >>> whenever
> > >>>> we
> > >>>>>>> click
> > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > >> forever
> > >>>> to
> > >>>>>> load
> > >>>>>>>> on
> > >>>>>>>>> the Graph view, so worried about this still being an issue
> > >>> here.
> > >>>> It
> > >>>>>> may
> > >>>>>>>> be
> > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > >> Not
> > >>>> sure
> > >>>>>> how
> > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > >>> push
> > >>>>> for
> > >>>>>>>> early
> > >>>>>>>>> optimization as its the root of all evil).
> > >>>>>>>>> Gerard Casas Saez
> > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > >>>>>> bin.huangxb@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Yu,
> > >>>>>>>>>>
> > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > >> distracted
> > >>>>>>> previously
> > >>>>>>>>>> and I didn't have the time to update the proposal. In fact,
> > >>>> after
> > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
> > >>> AIP
> > >>>>> has
> > >>>>>>>> been
> > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > >> rewriting
> > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
> > >>> in a
> > >>>>>>> future
> > >>>>>>>>>> date.).
> > >>>>>>>>>>
> > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > >>> features. I
> > >>>>>> think
> > >>>>>>>> we
> > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > >> AIP
> > >>>>> based
> > >>>>>>> on
> > >>>>>>>>>> what you have done in your PR?
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Bin
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > >>> yuqian1990@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > >>>>>>> implementation
> > >>>>>>>> of
> > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>>>>>>
> > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
> > >> so
> > >>>> i'm
> > >>>>>>>> quoting
> > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > >>>>> restriction
> > >>>>>>>>>>> "... **cannot*
> > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
> > >> a*
> > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > >>>> group*". I
> > >>>>>>> think
> > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
> > >>>> tasks
> > >>>>>> can
> > >>>>>>>> have
> > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > >>>> TaskGroup.
> > >>>>>> In
> > >>>>>>> my
> > >>>>>>>>>> PR,
> > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > >> when
> > >>>>>>>> TaskGroups
> > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > >> the
> > >>>> UI
> > >>>>>> look
> > >>>>>>>>>> less
> > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
> > >>> and
> > >>>>>> edges
> > >>>>>>>> so
> > >>>>>>>>>>> things work normally. Here's a screenshot
> > >>>>>>>>>>> <
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > >>>>>>>>>>>>
> > >>>>>>>>>>> of the UI interaction.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > >>>>>>> dependencies
> > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > >>>>>> dependencies
> > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > >>>> different
> > >>>>>>>>>> TaskGroup
> > >>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
> > >>>>> between
> > >>>>>> a
> > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
> > >>>> group
> > >>>>>> -
> > >>>>>>>> The
> > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
> > >>> but
> > >>>>>>> which
> > >>>>>>>>>> you
> > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
> > >>>>>> determine
> > >>>>>>>> what
> > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > >>>>> purposes*
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > >> implement
> > >>>> the
> > >>>>>>>>>> "retrying
> > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > >> feature
> > >>>> of
> > >>>>>>>>>> TaskGroup
> > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI
> > >>>>> concept.
> > >>>>>>> For
> > >>>>>>>>>> the
> > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > >> both
> > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > >> single
> > >>>>>>> operator.
> > >>>>>>>> It
> > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
> > >>>>>>> "reschedule"
> > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > >>>> running
> > >>>>>> job
> > >>>>>>>> to
> > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > >>> Then
> > >>>>>>>> reschedule
> > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > >> state.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > >>>>>>>>>> <jferriero@google.com.invalid
> > >>>>>>>>>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > >> think
> > >>>> this
> > >>>>>>> will
> > >>>>>>>>>> be
> > >>>>>>>>>>>> much easier to use than SubDag.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry
> > >>>>>> mechanics
> > >>>>>>>>>> via
> > >>>>>>>>>>> a
> > >>>>>>>>>>>> TaskGroup.retry_all property.
> > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
> > >> of
> > >>>>>> SubDag
> > >>>>>>>> for
> > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
> > >>>> state
> > >>>>>> then
> > >>>>>>>>>>>> reschedule poll until desired state reached".
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
> > >>>> task
> > >>>>>>> group
> > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > >> the
> > >>>>>>>>>>> SubmitJobTask
> > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > >> expected
> > >>>> to
> > >>>>>> run
> > >>>>>>> a
> > >>>>>>>>>>> long
> > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > >>>> freeing
> > >>>>> up
> > >>>>>>>>>> slots)
> > >>>>>>>>>>>> but might fail for a retryable reason.
> > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > >>>> purpose
> > >>>>>>>> because
> > >>>>>>>>>>>> SubDag infamously
> > >>>>>>>>>>>> <
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > >> very
> > >>>>> common
> > >>>>>>> for
> > >>>>>>>>>> a
> > >>>>>>>>>>>> single operator to submit job / wait til done.
> > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > >>>>> Dataproc,
> > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > >>>> PollTask]
> > >>>>>>> with
> > >>>>>>>>>> an
> > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > >>> take
> > >>>> a
> > >>>>>> long
> > >>>>>>>>>>> time.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > >> specific
> > >>>>> retry
> > >>>>>>>>>>> behavior
> > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > >> others
> > >>> in
> > >>>>> the
> > >>>>>>>>>>>> community would find this a useful feature.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>> Jake
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > >>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > >>> regular
> > >>>>>>>>>> planning
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task
> > >>>> force
> > >>>>>> for
> > >>>>>>> it
> > >>>>>>>>>>>> soon,
> > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > >>> even
> > >>>>>> start
> > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > >> we
> > >>>> are
> > >>>>>>>>>>>> prioritizing
> > >>>>>>>>>>>>> 2.0 work.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> J,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > >>>> yuqian1990@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi Jarek,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > >> existing
> > >>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > >>>> about
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > >> existing
> > >>>>>>>>>>>> SubDagOperator?
> > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
> > >>>>> concept
> > >>>>>>>>>> like
> > >>>>>>>>>>>> Ash
> > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > >> Whenever
> > >>> we
> > >>>>> are
> > >>>>>>>>>>> ready
> > >>>>>>>>>>>>> with
> > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > >>> 2.1.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > >> SubDagOperator
> > >>>>> idea
> > >>>>>>>>>> into
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > >>>>>> "reattaching
> > >>>>>>>>>> all
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
> > >>>>> pointed
> > >>>>>>>>>> out
> > >>>>>>>>>>> we
> > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > >>> setting
> > >>>> of
> > >>>>>>>>>>>> TaskGroup.
> > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > >>>> having
> > >>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
> > >>>>>> simplify
> > >>>>>>>>>>>>> Xinbin's
> > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
> > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
> > >>>> with
> > >>>>>> web
> > >>>>>>>>>> UI.
> > >>>>>>>>>>>> If
> > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Qian
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
> > >>> up.
> > >>>>>> Maybe
> > >>>>>>>>>> we
> > >>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > >>> about
> > >>>>>>>>>> further
> > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > >>> discussions
> > >>>>> (and
> > >>>>>>>>>> we
> > >>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > >>> point
> > >>>>>>>>>> focusing
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
> > >>>> now ?
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> J.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > >>>>>>>>>>> bin.huangxb@gmail.com>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Daniel
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > >> as a
> > >>>> DAG
> > >>>>>>>>>>> object
> > >>>>>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > >>>> related
> > >>>>> to
> > >>>>>>>>>>>> actual
> > >>>>>>>>>>>>>>>> execution or scheduling.
> > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > >>> weekend.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > >> you
> > >>>>>>>>>> import
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > >> shape
> > >>>> of
> > >>>>>> the
> > >>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > >>>>> similar
> > >>>>>>>>>>>> purpose
> > >>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> DAG factory function?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > >> object
> > >>>>> (e.g.
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> bitwise
> > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
> > >>>>>>>>>>>> “DAGTemplate”
> > >>>>>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > >> with
> > >>>>>>>>>>> parameters
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > >>>>> parameter
> > >>>>>>>>>>>>> itself,
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > >> opinion,
> > >>>> the
> > >>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
> > >>> and
> > >>>>> the
> > >>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > >>>>>>>>>>> execution/scheduling
> > >>>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > >>>> etc.)
> > >>>>>>>>>>> like
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>> does.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
> > >>>>>>>>>> interval
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > >>> min.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > >> that
> > >>>> you
> > >>>>>>>>>> want
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> achieve?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > >>>>>>>>>> thanosxnicholas@gmail.com
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > >> TaskGroup
> > >>>> the
> > >>>>>>>>>>> same
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
> > >>>>>>>>>> interval
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > >> example,
> > >>>>> there
> > >>>>>>>>>>> is
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> scenario
> > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > >>>>>>>>>> schedule
> > >>>>>>>>>>>>>> interval
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>> Nicholas
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > >>> SubDagOperator,
> > >>>>>>>>>>> maybe
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>> throw
> > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > >> subdag's
> > >>>>>>>>>>>>>>>> schedule_interval
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > >>> replace
> > >>>>>>>>>>>> SubDag,
> > >>>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > >>> whether
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> schedule
> > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > >>>> parent
> > >>>>>>>>>>>> DAG?
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > >>>> interval
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>> SubDAG.
> > >>>>>>>>>>>>>>>> If
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > >>> what
> > >>>>>>>>>>> will
> > >>>>>>>>>>>>>>> happen
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
> > >>> task
> > >>>>>>>>>>>>>> groups. I
> > >>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
> > >>>>>>>>>>> subdag
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> introduce
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > >> tasks
> > >>>>>>>>>>> along
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> their
> > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > >>> as a
> > >>>>>>>>>>>> DAG*.
> > >>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > >> you
> > >>>>>>>>>>> still
> > >>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> add
> > >>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
> > >>> args
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > >>>>>>>>>>>>>>>>>>>>> pass
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > >> adding
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
> > >>> dag
> > >>>>>>>>>>> file
> > >>>>>>>>>>>>>>>>>>>>> """
> > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > >>>>>>>>>>>>>>> default_args=default_args,
> > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > >>>>>>>>>>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > >> and
> > >>>>>>>>>> set
> > >>>>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>>>> between
> > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > >>>>>>>>>>>>>> SubDagOperator,
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > >>> task`.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > >>>>>>>>>> Airflow
> > >>>>>>>>>>>> 2.0
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > >> still
> > >>>>>>>>>> want
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>> keep
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > >> Beauchemin <
> > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
> > >>>>>>>>>>> groups
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > >>> DAG
> > >>>>>>>>>>>>> object
> > >>>>>>>>>>>>>>>> since
> > >>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > >>>>>>>>>>> create
> > >>>>>>>>>>>>>>>> underlying
> > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > >>>>>>>>>> group
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> tasks.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Max
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > >>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > >>>>>>>>>> it
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > >>>>>>>>>> does
> > >>>>>>>>>>>>> this I
> > >>>>>>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > >>>>>>>>>>> representation,
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>> at
> > >>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > >>>>>>>>>> In
> > >>>>>>>>>>> my
> > >>>>>>>>>>>>>>>> proposal
> > >>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > >>>>>>>>>> from
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> add
> > >>>>>>>>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > >> graph
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>> look
> > >>>>>>>>>>>>>>>>>> exactly
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > >>>>>>>>>> attached
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> those
> > >>>>>>>>>>>>>>>>>>>> sections.
> > >>>>>>>>>>>>>>>>>>>>>>> These
> > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > >> the
> > >>>>>>>>>>> UI.
> > >>>>>>>>>>>>> So
> > >>>>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>>> parsing
> > >>>>>>>>>>>>>>>>>>>>> (
> > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>>> *root_dag
> > >>>>>>>>>>>>>>>>>>>> *instead
> > >>>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > >>>>>>>>>>> naming
> > >>>>>>>>>>>>>>>>>>> suggestions),
> > >>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > >>>>>>>>>>> nested
> > >>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > >> something
> > >>>>>>>>>>>> like
> > >>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>> utilizing
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > >>>>>>>>>> in
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>>>> way.
> > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > >>>>>>>>>>> complexity
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>> execution
> > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > >>>>>>>>>> using
> > >>>>>>>>>>>>>> SubDag.
> > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > >>>>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>> code
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > >> the
> > >>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>> (see
> > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > >>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>> generating 1
> > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > >>>>>>>>>>> (in
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> case,
> > >>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > >>>>>>>>>>> root
> > >>>>>>>>>>>>>> dag).
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > >>>>>>>>>>>> with a
> > >>>>>>>>>>>>>>>>>> simpler
> > >>>>>>>>>>>>>>>>>>>>>> concept
> > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > >>>>>>>>>> out
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> contents
> > >>>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > >> is
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>>> necessary
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > >>>>>>>>>>>> name?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > >>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>> Palmer
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>> helping
> > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> > >> I
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>> paste
> > >>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>> here.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >>>>>>>>>> in
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >>>>>>>>>> a
> > >>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> either a
> > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >>>>>>>>>> in
> > >>>>>>>>>>>> any
> > >>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >>>>>>>>>> as
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>> single
> > >>>>>>>>>>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>>>>>>>>>> "status"
> > >>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > >>>>>>>>>>> executor), I
> > >>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > >>>>>>>>>> to
> > >>>>>>>>>>>>>>> implement
> > >>>>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>>>>>>> metadata
> > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>>> etc.)
> > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > >> pick
> > >>>>>>>>>>> up
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> individual
> > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > >>>>>>>>>> status
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > >>>>>>>>>> Imberman
> > >>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > >> operator
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> tie
> > >>>>>>>>>>>>>>> dags
> > >>>>>>>>>>>>>>>>>>>> together
> > >>>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > >> we
> > >>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>> essentially
> > >>>>>>>>>>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > >>>>>>>>>>>> starter-tasks
> > >>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > >>>>>>>>>> UI
> > >>>>>>>>>>>>>> concept.
> > >>>>>>>>>>>>>>>> It
> > >>>>>>>>>>>>>>>>>>>> doesn’t
> > >>>>>>>>>>>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > >> more
> > >>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> queue
> > >>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > >>>>>>>>>> available.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > >>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > >>>>>>>>>>>>>> abstraction.
> > >>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > >>>>>>>>>> high
> > >>>>>>>>>>>>> level
> > >>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >> in
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> same
> > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >> a
> > >>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >> in
> > >>>>>>>>>>> any
> > >>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> either
> > >>>>>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >>>>>>>>>> as a
> > >>>>>>>>>>>>>> single
> > >>>>>>>>>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>>>>>>>>> "status"
> > >>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > >>>>>>>>>>> object
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>> its
> > >>>>>>>>>>>>>>>>>> own
> > >>>>>>>>>>>>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
> > >>>>>>>>>>>> tasks.
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > >>>>>>>>>> point
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> view
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> DAG
> > >>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > >>>>>>>>>> differently.
> > >>>>>>>>>>> So
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> really
> > >>>>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > >> sets
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> Tasks,
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> allows
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > >>>>>>>>>> the
> > >>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>> important
> > >>>>>>>>>>>>>>>>>>>> issue
> > >>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > >>>>>>>>>>> right
> > >>>>>>>>>>>>> way
> > >>>>>>>>>>>>>>>>> forward
> > >>>>>>>>>>>>>>>>>>>> (just
> > >>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > >>>>>>>>>>> adding
> > >>>>>>>>>>>>>>> visual
> > >>>>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>> context
> > >>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>> why
> > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > >>>>>>>>>>>>>>>>>>>>>> . A
> > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > >>>>>>>>>> is
> > >>>>>>>>>>>> e.g.
> > >>>>>>>>>>>>>>>>> enabling
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > >>>>>>>>>>>> well. I
> > >>>>>>>>>>>>>> see
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>> being
> > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > >>>>>>>>>> UI
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>>>> one
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> two
> > >>>>>>>>>>>>>>>>>>>>>> items
> > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > >>>>>>>>>>>>>> functionality.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> they
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> always a
> > >>>>>>>>>>>>>>>>>>>>>> giant
> > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > >>>>>>>>>>>>> confusion
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> breakages
> > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > >>>>>>>>>> Coder <
> > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > >>>>>>>>>> UI
> > >>>>>>>>>>>>>>> concept. I
> > >>>>>>>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > >>>>>>>>>>> you
> > >>>>>>>>>>>>>> have a
> > >>>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > >>>>>>>>>> tasks
> > >>>>>>>>>>>>>> start,
> > >>>>>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > >>>>>>>>>>>> and I
> > >>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > >>>>>>>>>> Hamlin
> > >>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > >>>>>>>>>>>>>> Berlin-Taylor
> > >>>>>>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > >>>>>>>>>>>> anymore?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > >>>>>>>>>>>>> replacing
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > >>>>>>>>>> to
> > >>>>>>>>>>>> get
> > >>>>>>>>>>>>>>>> wrong,
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>> closer
> > >>>>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > >>>>>>>>>>>> subdags?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > >>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>>>>>>>>> running
> > >>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > >>>>>>>>>> we
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > >>>>>>>>>> it
> > >>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> something
> > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > >>>>>>>>>>> haven't
> > >>>>>>>>>>>>> used
> > >>>>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>>>>>>>> extensively
> > >>>>>>>>>>>>>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > >>>>>>>>>>>> has(?)
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> form
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > >>>>>>>>>> schedule_interval,
> > >>>>>>>>>>>> but
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> match
> > >>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > >>>>>>>>>>>> (Does
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>> make
> > >>>>>>>>>>>>>>>>>>> sense
> > >>>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > >>>>>>>>>>> sub
> > >>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>> would
> > >>>>>>>>>>>>>>>>>>> never
> > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > >>>>>>>>>>>>> operator a
> > >>>>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > >>>>>>>>>>>>>> Berlin-Taylor <
> > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > >>>>>>>>>>>>> excited
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> see
> > >>>>>>>>>>>>>>>>>> how
> > >>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>>>>>>>>>> parsing*:
> > >>>>>>>>>>>>> This
> > >>>>>>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>>>>>>>>>> parsing,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > >>>>>>>>>>>> already
> > >>>>>>>>>>>>>> does
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > >>>>>>>>>>>> correctly.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > >>>>>>>>>>>> Huang <
> > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > >>>>>>>>>>>> collect
> > >>>>>>>>>>>>>>>>> feedback
> > >>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > >>>>>>>>>>>>>> previously
> > >>>>>>>>>>>>>>>>>> briefly
> > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > >>>>>>>>>>> done
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> Airflow
> > >>>>>>>>>>>>>>>>>>> 2.0,
> > >>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > >>>>>>>>>>> attach
> > >>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>> back
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> root
> > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > >>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>> related
> > >>>>>>>>>>>>>>>>>>>>>> issues
> > >>>>>>>>>>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > >>>>>>>>>> while
> > >>>>>>>>>>>>>>> respecting
> > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > >>>>>>>>>> effect
> > >>>>>>>>>>>> on
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > >>>>>>>>>>>> function
> > >>>>>>>>>>>>>> more
> > >>>>>>>>>>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > >>>>>>>>>>>>>>> child_dag_name
> > >>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>>>>>>>>>> parsing*:
> > >>>>>>>>>>>>> This
> > >>>>>>>>>>>>>>>>>> rewrites
> > >>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>>>>>>>>>> parsing,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > >>>>>>>>>> new
> > >>>>>>>>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>>>>>>>>> acts
> > >>>>>>>>>>>>>>>>>>>>>>> like a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > >>>>>>>>>>>>> methods
> > >>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>> removed.
> > >>>>>>>>>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > >>>>>>>>>> *with
> > >>>>>>>>>>>>>>>>> *subdag_args
> > >>>>>>>>>>>>>>>>>>> *and
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > >>>>>>>>>> PythonOperator
> > >>>>>>>>>>>>>>>> signature.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > >>>>>>>>>>>>>>> current_group
> > >>>>>>>>>>>>>>>> &
> > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > >>>>>>>>>>> used
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > >>>>>>>>>>>>> further
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> group
> > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > >>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>> group-level
> > >>>>>>>>>>>>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> dag)
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > >>>>>>>>>> Proposed
> > >>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>>> modification
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > >>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>> structure
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> pair
> > >>>>>>>>>>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > >>>>>>>>>>>>> hierarchical
> > >>>>>>>>>>>>>>>>>>> structure.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > >>>>>>>>>> PRs
> > >>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> details:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > >>>>>>>>>>>>> aspects
> > >>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>> third
> > >>>>>>>>>>>>>>>>>> change
> > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > >>>>>>>>>>>> looking
> > >>>>>>>>>>>>>>>> forward
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> it!
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Jarek Potiuk
> > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > >> Software
> > >>>>>> Engineer
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > >> <+48660796129
> > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jarek Potiuk
> > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > >>>>> Engineer
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Jacob Ferriero*
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> jferriero@google.com
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 617-714-2509
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Hi, Gerard, yes I agree it's possible to do this at UI level without any
fundamental change to the implementation. If expand_group() sees that two
groups are fully connected (i.e. every task in one parent group depends on
every task in another parent group), it can decide to collapse all those
children edges into a single edge between the parent groups to reduce the
burden of the layout() function. However, I did not find any existing
algorithm to do this within dagre so we'll likely need to implement this
ourselves. Another hiccup is that at the moment it doesn't seem to be
possible to call setEdge() between two parent groups (aka clusters). If
someone has ideas how to do this please feel free to contribute.

One other consideration is that this example is only an extreme case. There
are other in-between cases that still require user intervention. Let's say
if 90% of tasks in group1 depends on 90% of tasks in group2 and both groups
have more than 100 tasks. This will still cause a lot of edges on the graph
and it's even harder to reduce because the parent groups are not fully
connected so it's inaccurate to reduce them to a single edge between the
parents. In those cases, the user may still need to do something
themselves. e.g. adding some DummyOperator to the DAG to cut down the
edges. There will be some tradeoff because DummyOperator takes a short
while to execute like you mentioned.

There are lots of room for improvements, but I don't think that's a
blocking issue for this AIP? So if you can move it to the voting stage
that'll be fantastic.


On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid> wrote:

> +1
>
> > 2020年8月18日 23:55,Gerard Casas Saez <gc...@twitter.com.INVALID> 写道:
> >
> > Is it not possible to solve this at the UI level? Aka tell dagre to only
> > add 1 edge to the group instead of to all nodes in the group? No need to
> do
> > SubDag behaviour, but just reduce the edges on the graph. Should reduce
> > load time if I understand correctly.
> >
> > I would strongly avoid the Dummy operator since it will introduce delays
> on
> > operator execution (as it will need to execute 1 dummy operator and that
> > can be expensive imo).
> >
> > Overall though proposal looks good, unless anyone opposes it, I would
> move
> > this to vote mode :D
> >
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com> wrote:
> >
> >> Hi, All,
> >> Here's the updated AIP-34
> >> <
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >>> .
> >> The PR has been fine-tuned with better UI interactions and added
> >> serialization of TaskGroup:
> https://github.com/apache/airflow/pull/10153
> >>
> >> Here's some experiment results:
> >> A made up dag containing 403 tasks, and 5696 edges. Grouped like this.
> Note
> >> there's a inside_section_2 is intentionally made to depend on all tasks
> >> in inside_section_1 to generate a large number of edges. The
> observation is
> >> that opening the top level graph is very quick, around 270ms. Expanding
> >> groups that don't have a lot of dense dependencies on other groups are
> also
> >> hardly noticeable. E.g expanding section_1 takes 330ms. The part that
> takes
> >> time is when expanding both groups inside_section_1 and inside_section_2
> >> Because there are 2500 edges between these two inner groups, it took 63
> >> seconds to expand both of them. Majority of the time (more than
> 62seconds)
> >> is actually taken by the layout() function in dagre. In other words,
> it's
> >> very fast to add nodes and edges, but laying them out on the graph takes
> >> time. This issue is not actually a problem specific to TaskGroup.
> Without
> >> TaskGroup, if a DAG contains too many edges, it takes time to layout the
> >> graph too.
> >>
> >> On the other hand, a more realistic experiment with production DAG
> >> containing about 400 tasks and 700 edges showed that grouping tasks into
> >> three levels of nested TaskGroup cut the upfront page opening time from
> >> around 6s to 500ms. (Obviously the time is paid back when user gradually
> >> expands all the groups one by one, but normally people don't need to
> expand
> >> every group every time so it's still a big saving). The experiments are
> >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> >>
> >> I can see a few possible improvements to TaskGroup (or how it's used)
> that
> >> can be done as a next-step:
> >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> >> displaying the whole DAG, we can limit the Graph View to show only a
> single
> >> TaskGroup, omitting its edges going out to other TaskGroups. This
> behaviour
> >> is more like SubDagOperator where users can zoom into/out of a TaskGroup
> >> and look at only tasks within that TaskGroup as if those are the only
> tasks
> >> on the DAG. This can be done with either background javascript calls or
> by
> >> making a new get request with filtering parameters. Obviously the
> downside
> >> is that it's not as explicit as showing all the dependencies on the
> graph.
> >> 2). Users can improve the organization of the DAG themselves to reduce
> the
> >> number of edges. E.g. if every task in group2 depends on every tasks in
> >> group1, instead of doing group1 >> group2, they can add a DummyOperator
> in
> >> between and do this: group1 >> dummy >> group2. This cuts down the
> number
> >> of edges significantly and page load becomes much faster.
> >> 3). If we really want, we can improve the >> operator of TaskGroup to
> do 2)
> >> automatically. If it sees that both sides of >> are TaskGroup, it can
> >> create a DummyOperator on behalf of the user. The downside is that it
> may
> >> be too much magic.
> >>
> >> Thanks,
> >> Qian
> >>
> >> def create_section():
> >> """
> >> Create tasks in the outer section.
> >> """
> >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
> >>
> >> with TaskGroup("inside_section_1") as inside_section_1:
> >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> >>
> >> with TaskGroup("inside_section_2") as inside_section_2:
> >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> >>
> >> dummies[-1] >> inside_section_1
> >> dummies[-2] >> inside_section_2
> >> inside_section_1 >> inside_section_2
> >>
> >>
> >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> >> start = DummyOperator(task_id="start")
> >>
> >> with TaskGroup("section_1") as section_1:
> >> create_section()
> >>
> >> some_other_task = DummyOperator(task_id="some-other-task")
> >>
> >> with TaskGroup("section_2") as section_2:
> >> create_section()
> >>
> >> end = DummyOperator(task_id='end')
> >>
> >> start >> section_1 >> some_other_task >> section_2 >> end
> >>
> >>
> >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> >> <gc...@twitter.com.invalid> wrote:
> >>
> >>> Re graph times. That makes sense. Let me know what you find. We may be
> >> able
> >>> to contribute on the lazy loading part.
> >>>
> >>> Looking forward to see the updated AIP!
> >>>
> >>>
> >>> Gerard Casas Saez
> >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >>>
> >>>
> >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> wrote:
> >>>
> >>>> Permissions granted, let me know if you face any issues.
> >>>>
> >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:
> >>>>
> >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> >>>>>
> >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> What's your ID i.e. if you haven't created an account yet, please
> >>>> create
> >>>>>> one at https://cwiki.apache.org/confluence/signup.action and send
> >> us
> >>>>> your
> >>>>>> ID and we will add permissions.
> >>>>>>
> >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
> >>>>>>> My wiki user email is yuqian1990@gmail.com.
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> >> to
> >>>> edit
> >>>>>> it?
> >>>>>>> My wiki user email is yuqian1990@gmail.com.
> >>>>>>>
> >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
> >>> server
> >>>>> at
> >>>>>>> once. However, it only adds the top level nodes and edges to the
> >>>> graph
> >>>>>> when
> >>>>>>> the Graph View page is first opened. And then adds the expanded
> >>> nodes
> >>>>> to
> >>>>>>> the graph as the user expands them. From what I've experienced
> >> with
> >>>>> DAGs
> >>>>>>> containing around 400 tasks (not using TaskGroup or
> >>> SubDagOperator),
> >>>>>>> opening the whole dag in Graph View usually takes 5 seconds. Less
> >>>> than
> >>>>>> 60ms
> >>>>>>> of that is taken by loading the data from webserver. The
> >> remaining
> >>>>> 4.9s+
> >>>>>> is
> >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> >>> createNodes,
> >>>>>>> createEdgeLabels, etc and by rendering the graph. With TaskGroup
> >>>> being
> >>>>>> used
> >>>>>>> to group tasks into a smaller number of top-level nodes, the
> >> amount
> >>>> of
> >>>>>> data
> >>>>>>> loaded from webserver will remain about the same compared to a
> >> flat
> >>>> dag
> >>>>>> of
> >>>>>>> the same size, but the number of nodes and edges needed to be
> >> plot
> >>> on
> >>>>> the
> >>>>>>> graph can be reduced significantly. So in theory this should
> >> speed
> >>> up
> >>>>> the
> >>>>>>> time it takes to open Graph View even without lazy-loading the
> >> data
> >>>>> (I'll
> >>>>>>> experiment to find out). That said, if it comes to a point
> >>>> lazy-loading
> >>>>>>> helps, we can still implement it as an improvement.
> >>>>>>>
> >>>>>>> Re James: the Tree View looks as if all all the groups are fully
> >>>>>> expanded.
> >>>>>>> (because under the hood all the tasks are in a single DAG). I'm
> >>> less
> >>>>>>> worried about Tree View at the moment because it already has a
> >>>>> mechanism
> >>>>>>> for collapsing tasks by the dependency tree. That said, the Tree
> >>> View
> >>>>> can
> >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
> >> in
> >>>> the
> >>>>>> same
> >>>>>>> TaskGroup when Tree View is first opened).
> >>>>>>>
> >>>>>>> For both suggestions, implementing them don't require fundamental
> >>>>> changes
> >>>>>>> to the idea. I think we can have a basic working TaskGroup first,
> >>> and
> >>>>>> then
> >>>>>>> improve it incrementally in several PRs as we get more feedback
> >>> from
> >>>>> the
> >>>>>>> community. What do you think?
> >>>>>>>
> >>>>>>> Qian
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> I agree this looks great, one question, how does the tree view
> >>>> look?
> >>>>>>>>
> >>>>>>>> James Coder
> >>>>>>>>
> >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> >>>>>> gcasassaez@twitter.com
> >>>>>>> .invalid>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> First of all, this is awesome!!
> >>>>>>>>>
> >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> >>>>> operators
> >>>>>> at
> >>>>>>>>> once. Wondering if we can load them as needed (aka load
> >>> whenever
> >>>> we
> >>>>>>> click
> >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> >> forever
> >>>> to
> >>>>>> load
> >>>>>>>> on
> >>>>>>>>> the Graph view, so worried about this still being an issue
> >>> here.
> >>>> It
> >>>>>> may
> >>>>>>>> be
> >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> >> Not
> >>>> sure
> >>>>>> how
> >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> >>> push
> >>>>> for
> >>>>>>>> early
> >>>>>>>>> optimization as its the root of all evil).
> >>>>>>>>> Gerard Casas Saez
> >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> >>>>>> bin.huangxb@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Yu,
> >>>>>>>>>>
> >>>>>>>>>> Thank you so much for taking on this. I was fairly
> >> distracted
> >>>>>>> previously
> >>>>>>>>>> and I didn't have the time to update the proposal. In fact,
> >>>> after
> >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
> >>> AIP
> >>>>> has
> >>>>>>>> been
> >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> >> rewriting
> >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
> >>> in a
> >>>>>>> future
> >>>>>>>>>> date.).
> >>>>>>>>>>
> >>>>>>>>>> Your PR is amazing and it has implemented the desire
> >>> features. I
> >>>>>> think
> >>>>>>>> we
> >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> >> AIP
> >>>>> based
> >>>>>>> on
> >>>>>>>>>> what you have done in your PR?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Bin
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> >>> yuqian1990@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> >>>>>>> implementation
> >>>>>>>> of
> >>>>>>>>>>> TaskGroup as UI grouping concept:
> >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> >>>>>>>>>>>
> >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
> >> so
> >>>> i'm
> >>>>>>>> quoting
> >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> >>>>> restriction
> >>>>>>>>>>> "... **cannot*
> >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
> >> a*
> >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> >>>> group*". I
> >>>>>>> think
> >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
> >>>> tasks
> >>>>>> can
> >>>>>>>> have
> >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> >>>> TaskGroup.
> >>>>>> In
> >>>>>>> my
> >>>>>>>>>> PR,
> >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> >> when
> >>>>>>>> TaskGroups
> >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> >> the
> >>>> UI
> >>>>>> look
> >>>>>>>>>> less
> >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
> >>> and
> >>>>>> edges
> >>>>>>>> so
> >>>>>>>>>>> things work normally. Here's a screenshot
> >>>>>>>>>>> <
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> >>>>>>>>>>>>
> >>>>>>>>>>> of the UI interaction.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
> >>>>>>> dependencies
> >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> >>>>>> dependencies
> >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> >>>> different
> >>>>>>>>>> TaskGroup
> >>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
> >>>>> between
> >>>>>> a
> >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
> >>>> group
> >>>>>> -
> >>>>>>>> The
> >>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
> >>> but
> >>>>>>> which
> >>>>>>>>>> you
> >>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
> >>>>>> determine
> >>>>>>>> what
> >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> >>>>> purposes*
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> >> implement
> >>>> the
> >>>>>>>>>> "retrying
> >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> >> feature
> >>>> of
> >>>>>>>>>> TaskGroup
> >>>>>>>>>>> although that may go against having TaskGroup as a pure UI
> >>>>> concept.
> >>>>>>> For
> >>>>>>>>>> the
> >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> >> both
> >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> >> single
> >>>>>>> operator.
> >>>>>>>> It
> >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
> >>>>>>> "reschedule"
> >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> >>>> running
> >>>>>> job
> >>>>>>>> to
> >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> >>> Then
> >>>>>>>> reschedule
> >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> >> state.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> >>>>>>>>>> <jferriero@google.com.invalid
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> >> think
> >>>> this
> >>>>>>> will
> >>>>>>>>>> be
> >>>>>>>>>>>> much easier to use than SubDag.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd like to propose an optional behavior for special retry
> >>>>>> mechanics
> >>>>>>>>>> via
> >>>>>>>>>>> a
> >>>>>>>>>>>> TaskGroup.retry_all property.
> >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
> >> of
> >>>>>> SubDag
> >>>>>>>> for
> >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
> >>>> state
> >>>>>> then
> >>>>>>>>>>>> reschedule poll until desired state reached".
> >>>>>>>>>>>>
> >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
> >>>> task
> >>>>>>> group
> >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> >> the
> >>>>>>>>>>> SubmitJobTask
> >>>>>>>>>>>> if something about the PollJobSensor fails.
> >>>>>>>>>>>> This pattern would be really nice for jobs that are
> >> expected
> >>>> to
> >>>>>> run
> >>>>>>> a
> >>>>>>>>>>> long
> >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> >>>> freeing
> >>>>> up
> >>>>>>>>>> slots)
> >>>>>>>>>>>> but might fail for a retryable reason.
> >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> >>>> purpose
> >>>>>>>> because
> >>>>>>>>>>>> SubDag infamously
> >>>>>>>>>>>> <
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> >>>>>>>>>>>>>
> >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> >> very
> >>>>> common
> >>>>>>> for
> >>>>>>>>>> a
> >>>>>>>>>>>> single operator to submit job / wait til done.
> >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> >>>>> Dataproc,
> >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> >>>> PollTask]
> >>>>>>> with
> >>>>>>>>>> an
> >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> >>> take
> >>>> a
> >>>>>> long
> >>>>>>>>>>> time.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd be happy to the development work on adding this
> >> specific
> >>>>> retry
> >>>>>>>>>>> behavior
> >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> >> others
> >>> in
> >>>>> the
> >>>>>>>>>>>> community would find this a useful feature.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>> Jake
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> >>>>>>>> Jarek.Potiuk@polidea.com
> >>>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> >>> regular
> >>>>>>>>>> planning
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> making some structured approach to 2.0 and starting task
> >>>> force
> >>>>>> for
> >>>>>>> it
> >>>>>>>>>>>> soon,
> >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> >>> even
> >>>>>> start
> >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> >> we
> >>>> are
> >>>>>>>>>>>> prioritizing
> >>>>>>>>>>>>> 2.0 work.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> J,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> >>>> yuqian1990@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Jarek,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> >> existing
> >>>>>>>>>>>> SubDagOperator
> >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
> >>>> about
> >>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>> a brand new concept/feature independent from the
> >> existing
> >>>>>>>>>>>> SubDagOperator?
> >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
> >>>>> concept
> >>>>>>>>>> like
> >>>>>>>>>>>> Ash
> >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> >> Whenever
> >>> we
> >>>>> are
> >>>>>>>>>>> ready
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> >>> 2.1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> >> SubDagOperator
> >>>>> idea
> >>>>>>>>>> into
> >>>>>>>>>>> a
> >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> >>>>>> "reattaching
> >>>>>>>>>> all
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
> >>>>> pointed
> >>>>>>>>>> out
> >>>>>>>>>>> we
> >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> >>> setting
> >>>> of
> >>>>>>>>>>>> TaskGroup.
> >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> >>>> having
> >>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
> >>>>>> simplify
> >>>>>>>>>>>>> Xinbin's
> >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
> >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
> >>>> with
> >>>>>> web
> >>>>>>>>>> UI.
> >>>>>>>>>>>> If
> >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Qian
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> >>>>>>>>>>> Jarek.Potiuk@polidea.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
> >>> up.
> >>>>>> Maybe
> >>>>>>>>>> we
> >>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> >>> about
> >>>>>>>>>> further
> >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> >>> discussions
> >>>>> (and
> >>>>>>>>>> we
> >>>>>>>>>>>>> should
> >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> >>> point
> >>>>>>>>>> focusing
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
> >>>> now ?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> J.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> >>>>>>>>>>> bin.huangxb@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Daniel
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> >> as a
> >>>> DAG
> >>>>>>>>>>> object
> >>>>>>>>>>>>>>> related
> >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> >>>> related
> >>>>> to
> >>>>>>>>>>>> actual
> >>>>>>>>>>>>>>>> execution or scheduling.
> >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> >>> weekend.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> >> you
> >>>>>>>>>> import
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> object
> >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> >> shape
> >>>> of
> >>>>>> the
> >>>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
> >>>>> similar
> >>>>>>>>>>>> purpose
> >>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> DAG factory function?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> >> object
> >>>>> (e.g.
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> bitwise
> >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
> >>>>>>>>>>>> “DAGTemplate”
> >>>>>>>>>>>>>>>> object
> >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> >> with
> >>>>>>>>>>> parameters
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> >>>>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> >>>>> parameter
> >>>>>>>>>>>>> itself,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> >> opinion,
> >>>> the
> >>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
> >>> and
> >>>>> the
> >>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> >>>>>>>>>>> execution/scheduling
> >>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> >>>> etc.)
> >>>>>>>>>>> like
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>> DAG
> >>>>>>>>>>>>>>>>> does.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
> >>>>>>>>>> interval
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> DAG
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> >>> min.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> >> that
> >>>> you
> >>>>>>>>>> want
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> achieve?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> >>>>>>>>>> thanosxnicholas@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> >> TaskGroup
> >>>> the
> >>>>>>>>>>> same
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
> >>>>>>>>>> interval
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> >> example,
> >>>>> there
> >>>>>>>>>>> is
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> scenario
> >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> >>>>>>>>>> schedule
> >>>>>>>>>>>>>> interval
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>> Nicholas
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> >>>>>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> >>> SubDagOperator,
> >>>>>>>>>>> maybe
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>> throw
> >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> >> subdag's
> >>>>>>>>>>>>>>>> schedule_interval
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> >>> replace
> >>>>>>>>>>>> SubDag,
> >>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> >>>>>>>>>>>> thanosxnicholas@gmail.com
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> >>> whether
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> schedule
> >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> >>>> parent
> >>>>>>>>>>>> DAG?
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> >>>> interval
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> SubDAG.
> >>>>>>>>>>>>>>>> If
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
> >>> what
> >>>>>>>>>>> will
> >>>>>>>>>>>>>>> happen
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
> >>> task
> >>>>>>>>>>>>>> groups. I
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
> >>>>>>>>>>> subdag
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> introduce
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> >> tasks
> >>>>>>>>>>> along
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> their
> >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
> >>> as a
> >>>>>>>>>>>> DAG*.
> >>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> >> you
> >>>>>>>>>>> still
> >>>>>>>>>>>>> need
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> >>>>>>>>>>>>>>>>>>>>> """
> >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
> >>> args
> >>>>>>>>>>>> from
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>>>>>> """
> >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> >>>>>>>>>>>>>>>>>>>>> pass
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> """
> >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> >> adding
> >>>>>>>>>>> tasks
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> DAG
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
> >>> dag
> >>>>>>>>>>> file
> >>>>>>>>>>>>>>>>>>>>> """
> >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> with download_group:
> >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> >>>>>>>>>>>>>>> default_args=default_args,
> >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> >>>>>>>>>>>>>>>>>>>>> start >> download_group
> >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> >>>>>>>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> >> and
> >>>>>>>>>> set
> >>>>>>>>>>>>>>>> dependencies
> >>>>>>>>>>>>>>>>>>>> between
> >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> >>>>>>>>>>>>>> SubDagOperator,
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> >>> task`.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> >>>>>>>>>> Airflow
> >>>>>>>>>>>> 2.0
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> >> still
> >>>>>>>>>> want
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>> keep
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> >> Beauchemin <
> >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
> >>>>>>>>>>> groups
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> zoom-in/out
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> >>> DAG
> >>>>>>>>>>>>> object
> >>>>>>>>>>>>>>>> since
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> >>>>>>>>>>> create
> >>>>>>>>>>>>>>>> underlying
> >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
> >>>>>>>>>> group
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>> tasks.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Max
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> >>>>>>>>>>>>>> rewrites
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> >>>>>>>>>> it
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> give a
> >>>>>>>>>>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> >>>>>>>>>> does
> >>>>>>>>>>>>> this I
> >>>>>>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>> At
> >>>>>>>>>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> >>>>>>>>>>> representation,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> >>>>>>>>>> In
> >>>>>>>>>>> my
> >>>>>>>>>>>>>>>> proposal
> >>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> >>>>>>>>>> from
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> >> graph
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> look
> >>>>>>>>>>>>>>>>>> exactly
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> >>>>>>>>>> attached
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> those
> >>>>>>>>>>>>>>>>>>>> sections.
> >>>>>>>>>>>>>>>>>>>>>>> These
> >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> >> the
> >>>>>>>>>>> UI.
> >>>>>>>>>>>>> So
> >>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>> parsing
> >>>>>>>>>>>>>>>>>>>>> (
> >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> >>>>>>>>>> the
> >>>>>>>>>>>>>>> *root_dag
> >>>>>>>>>>>>>>>>>>>> *instead
> >>>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> >>>>>>>>>>>>>>>>>> current_group=section-1,
> >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> >>>>>>>>>>> naming
> >>>>>>>>>>>>>>>>>>> suggestions),
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> >>>>>>>>>>> nested
> >>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> >> something
> >>>>>>>>>>>> like
> >>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>> utilizing
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> >>>>>>>>>> in
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>> way.
> >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> >>>>>>>>>>> complexity
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> SubDag
> >>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> >>>>>>>>>> using
> >>>>>>>>>>>>>> SubDag.
> >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> >>>>>>>>>>>>> reusable
> >>>>>>>>>>>>>>> dag
> >>>>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> >> the
> >>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>> SubDagOperator
> >>>>>>>>>>>>>>>>>>>>> (see
> >>>>>>>>>>>>>>>>>>>>>>> AIP
> >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> >>>>>>>>>>>>> function
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> generating 1
> >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> >>>>>>>>>>> (in
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> case,
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> >>>>>>>>>>> root
> >>>>>>>>>>>>>> dag).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> >>>>>>>>>>>> with a
> >>>>>>>>>>>>>>>>>> simpler
> >>>>>>>>>>>>>>>>>>>>>> concept
> >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> >>>>>>>>>> out
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> contents
> >>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> SubDag
> >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> >>>>>>>>>>>>>>>>>>>>>>> (forgive
> >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> >> is
> >>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>> necessary
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> keep the
> >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> >>>>>>>>>>>> name?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> >>>>>>>>>>>> Chris
> >>>>>>>>>>>>>>> Palmer
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>> helping
> >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> >> I
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>> paste
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>> here.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> >>>>>>>>>> in
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>>> TaskGroup,
> >>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>> either a
> >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> >>>>>>>>>> in
> >>>>>>>>>>>> any
> >>>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> >>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> either
> >>>>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> >>>>>>>>>> as
> >>>>>>>>>>> a
> >>>>>>>>>>>>>> single
> >>>>>>>>>>>>>>>>>>>> "object",
> >>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> >>>>>>>>>>>>> "status"
> >>>>>>>>>>>>>>> of a
> >>>>>>>>>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> >>>>>>>>>>> executor), I
> >>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>>>>> metadata
> >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> >>>>>>>>>>> tasks
> >>>>>>>>>>>>>> etc.)
> >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> >> pick
> >>>>>>>>>>> up
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> individual
> >>>>>>>>>>>>>>>>>>>>>> tasks'
> >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> >>>>>>>>>> status
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> >>>>>>>>>> Imberman
> >>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> >> operator
> >>>>>>>>>>> to
> >>>>>>>>>>>>> tie
> >>>>>>>>>>>>>>> dags
> >>>>>>>>>>>>>>>>>>>> together
> >>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> >> we
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>> essentially
> >>>>>>>>>>>>>>>>>>>>> write
> >>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> >>>>>>>>>>>> starter-tasks
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> >>>>>>>>>> UI
> >>>>>>>>>>>>>> concept.
> >>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>> doesn’t
> >>>>>>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> >> more
> >>>>>>>>>>>> tasks
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> queue
> >>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> >>>>>>>>>> available.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> >>>>>>>>>>>>>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> >>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> >>>>>>>>>>>>>> abstraction.
> >>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> >>>>>>>>>> high
> >>>>>>>>>>>>> level
> >>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> >> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>> TaskGroup,
> >>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> >> a
> >>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> either
> >>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> >> in
> >>>>>>>>>>> any
> >>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> >>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> either
> >>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> >>>>>>>>>> as a
> >>>>>>>>>>>>>> single
> >>>>>>>>>>>>>>>>>>> "object",
> >>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> >>>>>>>>>>>> "status"
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> >>>>>>>>>>> object
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
> >>>>>>>>>>>> tasks.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> >>>>>>>>>> point
> >>>>>>>>>>> of
> >>>>>>>>>>>>>> view
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> DAG
> >>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> >>>>>>>>>> differently.
> >>>>>>>>>>> So
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>> becomes
> >>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> >> sets
> >>>>>>>>>>> of
> >>>>>>>>>>>>>> Tasks,
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> >>>>>>>>>> the
> >>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>> important
> >>>>>>>>>>>>>>>>>>>> issue
> >>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> >>>>>>>>>>> right
> >>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>>>>>> (just
> >>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> >>>>>>>>>>> adding
> >>>>>>>>>>>>>>> visual
> >>>>>>>>>>>>>>>>>>> grouping
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> >>>>>>>>>>> with
> >>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>> context
> >>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>> why
> >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> >>>>>>>>>>>>>>>>>>>>>> . A
> >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> >>>>>>>>>> is
> >>>>>>>>>>>> e.g.
> >>>>>>>>>>>>>>>>> enabling
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> >>>>>>>>>>>> well. I
> >>>>>>>>>>>>>> see
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> being
> >>>>>>>>>>>>>>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> >>>>>>>>>> UI
> >>>>>>>>>>>> but
> >>>>>>>>>>>>>> one
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> two
> >>>>>>>>>>>>>>>>>>>>>> items
> >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> >>>>>>>>>>>>>> functionality.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> >>>>>>>>>> and
> >>>>>>>>>>>>> they
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> always a
> >>>>>>>>>>>>>>>>>>>>>> giant
> >>>>>>>>>>>>>>>>>>>>>>>>> pain
> >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> >>>>>>>>>>>>> confusion
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> breakages
> >>>>>>>>>>>>>>>>>>>>>>>>> during
> >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> >>>>>>>>>> Coder <
> >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> >>>>>>>>>> UI
> >>>>>>>>>>>>>>> concept. I
> >>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> >>>>>>>>>>> you
> >>>>>>>>>>>>>> have a
> >>>>>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> >>>>>>>>>> tasks
> >>>>>>>>>>>>>> start,
> >>>>>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> >>>>>>>>>>>> and I
> >>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> >>>>>>>>>> Hamlin
> >>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> >>>>>>>>>>>>>> Berlin-Taylor
> >>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> >>>>>>>>>>>> anymore?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> >>>>>>>>>>>>> replacing
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> >>>>>>>>>> to
> >>>>>>>>>>>> get
> >>>>>>>>>>>>>>>> wrong,
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>> closer
> >>>>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> >>>>>>>>>>>> subdags?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> >>>>>>>>>>>> subdags
> >>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>>>>>> running
> >>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> >>>>>>>>>> we
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> >>>>>>>>>> it
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>>> simpler.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> >>>>>>>>>>> haven't
> >>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>>>>>> extensively
> >>>>>>>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> >>>>>>>>>>>> has(?)
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> form
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> >>>>>>>>>> schedule_interval,
> >>>>>>>>>>>> but
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> match
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> >>>>>>>>>>>> (Does
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>>>> sense
> >>>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> >>>>>>>>>>> sub
> >>>>>>>>>>>>> dag
> >>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> >>>>>>>>>>>>> operator a
> >>>>>>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> >>>>>>>>>>>>>> Berlin-Taylor <
> >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> >>>>>>>>>>>>> excited
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> see
> >>>>>>>>>>>>>>>>>> how
> >>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> >>>>>>>>>>> parsing*:
> >>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>>> rewrites
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> >>>>>>>>>>> parsing,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>> give a
> >>>>>>>>>>>>>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> >>>>>>>>>>>> already
> >>>>>>>>>>>>>> does
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>>>>>> At
> >>>>>>>>>>>>>>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> >>>>>>>>>>>> correctly.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> >>>>>>>>>>>> Huang <
> >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> >>>>>>>>>>>> collect
> >>>>>>>>>>>>>>>>> feedback
> >>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> >>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> >>>>>>>>>>>>>> previously
> >>>>>>>>>>>>>>>>>> briefly
> >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> >>>>>>>>>>> done
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> Airflow
> >>>>>>>>>>>>>>>>>>> 2.0,
> >>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>> one of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> >>>>>>>>>>> attach
> >>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>> back
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> root
> >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> >>>>>>>>>>>>>> SubDagOperator
> >>>>>>>>>>>>>>>>>> related
> >>>>>>>>>>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> >>>>>>>>>> while
> >>>>>>>>>>>>>>> respecting
> >>>>>>>>>>>>>>>>>>>>>> dependencies
> >>>>>>>>>>>>>>>>>>>>>>>>>> during
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> >>>>>>>>>> effect
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> >>>>>>>>>>>> function
> >>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>> reusable
> >>>>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> >>>>>>>>>>>>>>> child_dag_name
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> >>>>>>>>>>> parsing*:
> >>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>>> rewrites
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> >>>>>>>>>>> parsing,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>> give a
> >>>>>>>>>>>>>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> >>>>>>>>>> new
> >>>>>>>>>>>>>>>>> SubDagOperator
> >>>>>>>>>>>>>>>>>>>> acts
> >>>>>>>>>>>>>>>>>>>>>>> like a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> >>>>>>>>>>>>> methods
> >>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> removed.
> >>>>>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> >>>>>>>>>> *with
> >>>>>>>>>>>>>>>>> *subdag_args
> >>>>>>>>>>>>>>>>>>> *and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> >>>>>>>>>> PythonOperator
> >>>>>>>>>>>>>>>> signature.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> >>>>>>>>>>>>>>> current_group
> >>>>>>>>>>>>>>>> &
> >>>>>>>>>>>>>>>>>>>>>> parent_group
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> >>>>>>>>>>> used
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> >>>>>>>>>>>>> further
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> group
> >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> >>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>> group-level
> >>>>>>>>>>>>>>>>>>>>>> operations
> >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> dag)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> >>>>>>>>>> Proposed
> >>>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>>> modification
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> >>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>> structure
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> pair
> >>>>>>>>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> >>>>>>>>>>>>> hierarchical
> >>>>>>>>>>>>>>>>>>> structure.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> >>>>>>>>>> PRs
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> details:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> >>>>>>>>>>>>> aspects
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> third
> >>>>>>>>>>>>>>>>>> change
> >>>>>>>>>>>>>>>>>>>>>>> regarding
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> >>>>>>>>>>>> looking
> >>>>>>>>>>>>>>>> forward
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> it!
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> >>>>>>>>>>>>>>>>>>>>>>> Poornima
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> >> Software
> >>>>>> Engineer
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> >> <+48660796129
> >>>>>>>>>>>>> <+48%20660%20796%20129>>
> >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> >>>>> Engineer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> >>>>>>>>>>>>> <+48%20660%20796%20129>>
> >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Jacob Ferriero*
> >>>>>>>>>>>>
> >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> >>>>>>>>>>>>
> >>>>>>>>>>>> jferriero@google.com
> >>>>>>>>>>>>
> >>>>>>>>>>>> 617-714-2509
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by 耀 周 <zh...@icloud.com.INVALID>.
+1

> 2020年8月18日 23:55,Gerard Casas Saez <gc...@twitter.com.INVALID> 写道:
> 
> Is it not possible to solve this at the UI level? Aka tell dagre to only
> add 1 edge to the group instead of to all nodes in the group? No need to do
> SubDag behaviour, but just reduce the edges on the graph. Should reduce
> load time if I understand correctly.
> 
> I would strongly avoid the Dummy operator since it will introduce delays on
> operator execution (as it will need to execute 1 dummy operator and that
> can be expensive imo).
> 
> Overall though proposal looks good, unless anyone opposes it, I would move
> this to vote mode :D
> 
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> 
> 
> On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com> wrote:
> 
>> Hi, All,
>> Here's the updated AIP-34
>> <
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
>>> .
>> The PR has been fine-tuned with better UI interactions and added
>> serialization of TaskGroup: https://github.com/apache/airflow/pull/10153
>> 
>> Here's some experiment results:
>> A made up dag containing 403 tasks, and 5696 edges. Grouped like this. Note
>> there's a inside_section_2 is intentionally made to depend on all tasks
>> in inside_section_1 to generate a large number of edges. The observation is
>> that opening the top level graph is very quick, around 270ms. Expanding
>> groups that don't have a lot of dense dependencies on other groups are also
>> hardly noticeable. E.g expanding section_1 takes 330ms. The part that takes
>> time is when expanding both groups inside_section_1 and inside_section_2
>> Because there are 2500 edges between these two inner groups, it took 63
>> seconds to expand both of them. Majority of the time (more than 62seconds)
>> is actually taken by the layout() function in dagre. In other words, it's
>> very fast to add nodes and edges, but laying them out on the graph takes
>> time. This issue is not actually a problem specific to TaskGroup. Without
>> TaskGroup, if a DAG contains too many edges, it takes time to layout the
>> graph too.
>> 
>> On the other hand, a more realistic experiment with production DAG
>> containing about 400 tasks and 700 edges showed that grouping tasks into
>> three levels of nested TaskGroup cut the upfront page opening time from
>> around 6s to 500ms. (Obviously the time is paid back when user gradually
>> expands all the groups one by one, but normally people don't need to expand
>> every group every time so it's still a big saving). The experiments are
>> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
>> 
>> I can see a few possible improvements to TaskGroup (or how it's used) that
>> can be done as a next-step:
>> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
>> displaying the whole DAG, we can limit the Graph View to show only a single
>> TaskGroup, omitting its edges going out to other TaskGroups. This behaviour
>> is more like SubDagOperator where users can zoom into/out of a TaskGroup
>> and look at only tasks within that TaskGroup as if those are the only tasks
>> on the DAG. This can be done with either background javascript calls or by
>> making a new get request with filtering parameters. Obviously the downside
>> is that it's not as explicit as showing all the dependencies on the graph.
>> 2). Users can improve the organization of the DAG themselves to reduce the
>> number of edges. E.g. if every task in group2 depends on every tasks in
>> group1, instead of doing group1 >> group2, they can add a DummyOperator in
>> between and do this: group1 >> dummy >> group2. This cuts down the number
>> of edges significantly and page load becomes much faster.
>> 3). If we really want, we can improve the >> operator of TaskGroup to do 2)
>> automatically. If it sees that both sides of >> are TaskGroup, it can
>> create a DummyOperator on behalf of the user. The downside is that it may
>> be too much magic.
>> 
>> Thanks,
>> Qian
>> 
>> def create_section():
>> """
>> Create tasks in the outer section.
>> """
>> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
>> 
>> with TaskGroup("inside_section_1") as inside_section_1:
>> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>> 
>> with TaskGroup("inside_section_2") as inside_section_2:
>> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>> 
>> dummies[-1] >> inside_section_1
>> dummies[-2] >> inside_section_2
>> inside_section_1 >> inside_section_2
>> 
>> 
>> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
>> start = DummyOperator(task_id="start")
>> 
>> with TaskGroup("section_1") as section_1:
>> create_section()
>> 
>> some_other_task = DummyOperator(task_id="some-other-task")
>> 
>> with TaskGroup("section_2") as section_2:
>> create_section()
>> 
>> end = DummyOperator(task_id='end')
>> 
>> start >> section_1 >> some_other_task >> section_2 >> end
>> 
>> 
>> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
>> <gc...@twitter.com.invalid> wrote:
>> 
>>> Re graph times. That makes sense. Let me know what you find. We may be
>> able
>>> to contribute on the lazy loading part.
>>> 
>>> Looking forward to see the updated AIP!
>>> 
>>> 
>>> Gerard Casas Saez
>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>>> 
>>> 
>>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com> wrote:
>>> 
>>>> Permissions granted, let me know if you face any issues.
>>>> 
>>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:
>>>> 
>>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
>>>>> 
>>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> What's your ID i.e. if you haven't created an account yet, please
>>>> create
>>>>>> one at https://cwiki.apache.org/confluence/signup.action and send
>> us
>>>>> your
>>>>>> ID and we will add permissions.
>>>>>> 
>>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
>>>>>>> My wiki user email is yuqian1990@gmail.com.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
>> to
>>>> edit
>>>>>> it?
>>>>>>> My wiki user email is yuqian1990@gmail.com.
>>>>>>> 
>>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
>>> server
>>>>> at
>>>>>>> once. However, it only adds the top level nodes and edges to the
>>>> graph
>>>>>> when
>>>>>>> the Graph View page is first opened. And then adds the expanded
>>> nodes
>>>>> to
>>>>>>> the graph as the user expands them. From what I've experienced
>> with
>>>>> DAGs
>>>>>>> containing around 400 tasks (not using TaskGroup or
>>> SubDagOperator),
>>>>>>> opening the whole dag in Graph View usually takes 5 seconds. Less
>>>> than
>>>>>> 60ms
>>>>>>> of that is taken by loading the data from webserver. The
>> remaining
>>>>> 4.9s+
>>>>>> is
>>>>>>> taken by javascript functions in dagre-d3.min.js such as
>>> createNodes,
>>>>>>> createEdgeLabels, etc and by rendering the graph. With TaskGroup
>>>> being
>>>>>> used
>>>>>>> to group tasks into a smaller number of top-level nodes, the
>> amount
>>>> of
>>>>>> data
>>>>>>> loaded from webserver will remain about the same compared to a
>> flat
>>>> dag
>>>>>> of
>>>>>>> the same size, but the number of nodes and edges needed to be
>> plot
>>> on
>>>>> the
>>>>>>> graph can be reduced significantly. So in theory this should
>> speed
>>> up
>>>>> the
>>>>>>> time it takes to open Graph View even without lazy-loading the
>> data
>>>>> (I'll
>>>>>>> experiment to find out). That said, if it comes to a point
>>>> lazy-loading
>>>>>>> helps, we can still implement it as an improvement.
>>>>>>> 
>>>>>>> Re James: the Tree View looks as if all all the groups are fully
>>>>>> expanded.
>>>>>>> (because under the hood all the tasks are in a single DAG). I'm
>>> less
>>>>>>> worried about Tree View at the moment because it already has a
>>>>> mechanism
>>>>>>> for collapsing tasks by the dependency tree. That said, the Tree
>>> View
>>>>> can
>>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
>> in
>>>> the
>>>>>> same
>>>>>>> TaskGroup when Tree View is first opened).
>>>>>>> 
>>>>>>> For both suggestions, implementing them don't require fundamental
>>>>> changes
>>>>>>> to the idea. I think we can have a basic working TaskGroup first,
>>> and
>>>>>> then
>>>>>>> improve it incrementally in several PRs as we get more feedback
>>> from
>>>>> the
>>>>>>> community. What do you think?
>>>>>>> 
>>>>>>> Qian
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> I agree this looks great, one question, how does the tree view
>>>> look?
>>>>>>>> 
>>>>>>>> James Coder
>>>>>>>> 
>>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
>>>>>> gcasassaez@twitter.com
>>>>>>> .invalid>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> First of all, this is awesome!!
>>>>>>>>> 
>>>>>>>>> Secondly, checking your UI code, seems you are loading all
>>>>> operators
>>>>>> at
>>>>>>>>> once. Wondering if we can load them as needed (aka load
>>> whenever
>>>> we
>>>>>>> click
>>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
>> forever
>>>> to
>>>>>> load
>>>>>>>> on
>>>>>>>>> the Graph view, so worried about this still being an issue
>>> here.
>>>> It
>>>>>> may
>>>>>>>> be
>>>>>>>>> easily solvable by implementing lazy loading of the graph.
>> Not
>>>> sure
>>>>>> how
>>>>>>>>> easy to implement/add to the UI extension (and dont want to
>>> push
>>>>> for
>>>>>>>> early
>>>>>>>>> optimization as its the root of all evil).
>>>>>>>>> Gerard Casas Saez
>>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
>>>>>> bin.huangxb@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Yu,
>>>>>>>>>> 
>>>>>>>>>> Thank you so much for taking on this. I was fairly
>> distracted
>>>>>>> previously
>>>>>>>>>> and I didn't have the time to update the proposal. In fact,
>>>> after
>>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
>>> AIP
>>>>> has
>>>>>>>> been
>>>>>>>>>> changed to favor the concept of TaskGroup instead of
>> rewriting
>>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
>>> in a
>>>>>>> future
>>>>>>>>>> date.).
>>>>>>>>>> 
>>>>>>>>>> Your PR is amazing and it has implemented the desire
>>> features. I
>>>>>> think
>>>>>>>> we
>>>>>>>>>> can focus on your new PR instead. Do you mind updating the
>> AIP
>>>>> based
>>>>>>> on
>>>>>>>>>> what you have done in your PR?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Bin
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
>>> yuqian1990@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
>>>>>>> implementation
>>>>>>>> of
>>>>>>>>>>> TaskGroup as UI grouping concept:
>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
>>>>>>>>>>> 
>>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
>> so
>>>> i'm
>>>>>>>> quoting
>>>>>>>>>>> it here. The only thing I don't fully agree with is the
>>>>> restriction
>>>>>>>>>>> "... **cannot*
>>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
>> a*
>>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
>>>> group*". I
>>>>>>> think
>>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
>>>> tasks
>>>>>> can
>>>>>>>> have
>>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
>>>> TaskGroup.
>>>>>> In
>>>>>>> my
>>>>>>>>>> PR,
>>>>>>>>>>> this is allowed. The graph edges will update accordingly
>> when
>>>>>>>> TaskGroups
>>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
>> the
>>>> UI
>>>>>> look
>>>>>>>>>> less
>>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
>>> and
>>>>>> edges
>>>>>>>> so
>>>>>>>>>>> things work normally. Here's a screenshot
>>>>>>>>>>> <
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
>>>>>>>>>>>> 
>>>>>>>>>>> of the UI interaction.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
>>>>>>> dependencies
>>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
>>>>>> dependencies
>>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
>>>> different
>>>>>>>>>> TaskGroup
>>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
>>>>> between
>>>>>> a
>>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
>>>> group
>>>>>> -
>>>>>>>> The
>>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
>>> but
>>>>>>> which
>>>>>>>>>> you
>>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
>>>>>> determine
>>>>>>>> what
>>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
>>>>> purposes*
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
>> implement
>>>> the
>>>>>>>>>> "retrying
>>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
>> feature
>>>> of
>>>>>>>>>> TaskGroup
>>>>>>>>>>> although that may go against having TaskGroup as a pure UI
>>>>> concept.
>>>>>>> For
>>>>>>>>>> the
>>>>>>>>>>> motivating example Jake provided, I suggest implementing
>> both
>>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
>> single
>>>>>>> operator.
>>>>>>>> It
>>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
>>>>>>> "reschedule"
>>>>>>>>>>> mode, i.e. it first executes some code to submit the long
>>>> running
>>>>>> job
>>>>>>>> to
>>>>>>>>>>> the external service, and store the state (e.g. in XCom).
>>> Then
>>>>>>>> reschedule
>>>>>>>>>>> itself. Subsequent runs then pokes for the completion
>> state.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
>>>>>>>>>> <jferriero@google.com.invalid
>>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I really like this idea of a TaskGroup container as I
>> think
>>>> this
>>>>>>> will
>>>>>>>>>> be
>>>>>>>>>>>> much easier to use than SubDag.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'd like to propose an optional behavior for special retry
>>>>>> mechanics
>>>>>>>>>> via
>>>>>>>>>>> a
>>>>>>>>>>>> TaskGroup.retry_all property.
>>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
>> of
>>>>>> SubDag
>>>>>>>> for
>>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
>>>> state
>>>>>> then
>>>>>>>>>>>> reschedule poll until desired state reached".
>>>>>>>>>>>> 
>>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
>>>> task
>>>>>>> group
>>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
>>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
>> the
>>>>>>>>>>> SubmitJobTask
>>>>>>>>>>>> if something about the PollJobSensor fails.
>>>>>>>>>>>> This pattern would be really nice for jobs that are
>> expected
>>>> to
>>>>>> run
>>>>>>> a
>>>>>>>>>>> long
>>>>>>>>>>>> time (because we can use sensor can use reschedule mode
>>>> freeing
>>>>> up
>>>>>>>>>> slots)
>>>>>>>>>>>> but might fail for a retryable reason.
>>>>>>>>>>>> However, using SubDag to meet this use case defeats the
>>>> purpose
>>>>>>>> because
>>>>>>>>>>>> SubDag infamously
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
>>>>>>>>>>>>> 
>>>>>>>>>>>> blocks a "controller" slot for the entire duration.
>>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
>> very
>>>>> common
>>>>>>> for
>>>>>>>>>> a
>>>>>>>>>>>> single operator to submit job / wait til done.
>>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
>>>>> Dataproc,
>>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
>>>> PollTask]
>>>>>>> with
>>>>>>>>>> an
>>>>>>>>>>>> optional reschedule mode if user knows that this job may
>>> take
>>>> a
>>>>>> long
>>>>>>>>>>> time.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'd be happy to the development work on adding this
>> specific
>>>>> retry
>>>>>>>>>>> behavior
>>>>>>>>>>>> to TaskGroup once the base concept is implemented if
>> others
>>> in
>>>>> the
>>>>>>>>>>>> community would find this a useful feature.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jake
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
>>>>>>>> Jarek.Potiuk@polidea.com
>>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> All for it :) . I think we are getting closer to have
>>> regular
>>>>>>>>>> planning
>>>>>>>>>>>> and
>>>>>>>>>>>>> making some structured approach to 2.0 and starting task
>>>> force
>>>>>> for
>>>>>>> it
>>>>>>>>>>>> soon,
>>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
>>> even
>>>>>> start
>>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
>> we
>>>> are
>>>>>>>>>>>> prioritizing
>>>>>>>>>>>>> 2.0 work.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> J,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
>>>> yuqian1990@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Jarek,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I agree we should not change the behaviour of the
>> existing
>>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
>>>> about
>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>> as
>>>>>>>>>>>>>> a brand new concept/feature independent from the
>> existing
>>>>>>>>>>>> SubDagOperator?
>>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
>>>>> concept
>>>>>>>>>> like
>>>>>>>>>>>> Ash
>>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
>> Whenever
>>> we
>>>>> are
>>>>>>>>>>> ready
>>>>>>>>>>>>> with
>>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
>>> 2.1.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I really like Ash's idea of simplifying the
>> SubDagOperator
>>>>> idea
>>>>>>>>>> into
>>>>>>>>>>> a
>>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
>>>>>> "reattaching
>>>>>>>>>> all
>>>>>>>>>>>> the
>>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
>>>>> pointed
>>>>>>>>>> out
>>>>>>>>>>> we
>>>>>>>>>>>>>> need some helper functions to simplify dependencies
>>> setting
>>>> of
>>>>>>>>>>>> TaskGroup.
>>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
>>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
>>>> having
>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>> as
>>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
>>>>>> simplify
>>>>>>>>>>>>> Xinbin's
>>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
>>>> with
>>>>>> web
>>>>>>>>>> UI.
>>>>>>>>>>>> If
>>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Qian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
>>>>>>>>>>> Jarek.Potiuk@polidea.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
>>> up.
>>>>>> Maybe
>>>>>>>>>> we
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
>>> about
>>>>>>>>>> further
>>>>>>>>>>>>>>> improvements to 2.1? While those are important
>>> discussions
>>>>> (and
>>>>>>>>>> we
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> continue them in the  near future !) I think at this
>>> point
>>>>>>>>>> focusing
>>>>>>>>>>>> on
>>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
>>>> now ?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> J.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
>>>>>>>>>>> bin.huangxb@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Daniel
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
>> as a
>>>> DAG
>>>>>>>>>>> object
>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
>>>> related
>>>>> to
>>>>>>>>>>>> actual
>>>>>>>>>>>>>>>> execution or scheduling.
>>>>>>>>>>>>>>>> I will update the AIP according to this over the
>>> weekend.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
>> you
>>>>>>>>>> import
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>> you can import it with parameters to determine the
>> shape
>>>> of
>>>>>> the
>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
>>>>> similar
>>>>>>>>>>>> purpose
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> DAG factory function?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
>>>>>>>>>>>>>>> daniel.imberman@gmail.com
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
>> object
>>>>> (e.g.
>>>>>>>>>>> the
>>>>>>>>>>>>>>> bitwise
>>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
>>>>>>>>>>>> “DAGTemplate”
>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
>> with
>>>>>>>>>>> parameters
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> determine the shape of the DAG.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
>>>>> parameter
>>>>>>>>>>>>> itself,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
>> opinion,
>>>> the
>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
>>> and
>>>>> the
>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
>>>>>>>>>>> execution/scheduling
>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
>>>> etc.)
>>>>>>>>>>> like
>>>>>>>>>>>> a
>>>>>>>>>>>>>> DAG
>>>>>>>>>>>>>>>>> does.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
>>>>>>>>>> interval
>>>>>>>>>>>> of
>>>>>>>>>>>>>> DAG
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
>>> min.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
>> that
>>>> you
>>>>>>>>>> want
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> achieve?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
>>>>>>>>>> thanosxnicholas@gmail.com
>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
>> TaskGroup
>>>> the
>>>>>>>>>>> same
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
>>>>>>>>>> interval
>>>>>>>>>>> of
>>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
>> example,
>>>>> there
>>>>>>>>>>> is
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
>>>>>>>>>> schedule
>>>>>>>>>>>>>> interval
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Nicholas
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
>>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
>>> SubDagOperator,
>>>>>>>>>>> maybe
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>> throw
>>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
>> subdag's
>>>>>>>>>>>>>>>> schedule_interval
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
>>> replace
>>>>>>>>>>>> SubDag,
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
>>>>>>>>>>>> thanosxnicholas@gmail.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
>>> whether
>>>>>>>>>> the
>>>>>>>>>>>>>>> schedule
>>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
>>>> parent
>>>>>>>>>>>> DAG?
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
>>>> interval
>>>>>>>>>>> of
>>>>>>>>>>>>>>> SubDAG.
>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
>>> what
>>>>>>>>>>> will
>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>> Nicholas Jiang
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
>>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
>>> task
>>>>>>>>>>>>>> groups. I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
>>>>>>>>>>> subdag
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> introduce
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
>> tasks
>>>>>>>>>>> along
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
>>> as a
>>>>>>>>>>>> DAG*.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
>> you
>>>>>>>>>>> still
>>>>>>>>>>>>> need
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> a DAG for execution.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>> class TaskGroup:
>>>>>>>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
>>> args
>>>>>>>>>>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
>>>>>>>>>>>>>>>>>>>>> pass
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
>> adding
>>>>>>>>>>> tasks
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> DAG
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
>>> dag
>>>>>>>>>>> file
>>>>>>>>>>>>>>>>>>>>> """
>>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
>>>>>>>>>>>>>>>>>>>> default_args=default_args)
>>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
>>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> with download_group:
>>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
>>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
>>>>>>>>>>>>>>> default_args=default_args,
>>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
>>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
>>>>>>>>>>>>>>>>>>>>> start >> download_group
>>>>>>>>>>>>>>>>>>>>> # this is equivalent to
>>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
>>>>>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
>> and
>>>>>>>>>> set
>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
>>>>>>>>>>>>>> SubDagOperator,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
>>> task`.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
>>>>>>>>>> Airflow
>>>>>>>>>>>> 2.0
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
>> still
>>>>>>>>>> want
>>>>>>>>>>>> to
>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
>> Beauchemin <
>>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
>>>>>>>>>>> groups
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> zoom-in/out
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
>>> DAG
>>>>>>>>>>>>> object
>>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
>>>>>>>>>>> create
>>>>>>>>>>>>>>>> underlying
>>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
>>>>>>>>>> group
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> tasks.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
>>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
>>>>>>>>>>>>>> rewrites
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
>>>>>>>>>> it
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
>>>>>>>>>> does
>>>>>>>>>>>>> this I
>>>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>> At
>>>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
>>>>>>>>>>> representation,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
>>>>>>>>>> In
>>>>>>>>>>> my
>>>>>>>>>>>>>>>> proposal
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
>>>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
>> graph
>>>>>>>>>>>> will
>>>>>>>>>>>>>> look
>>>>>>>>>>>>>>>>>> exactly
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
>>>>>>>>>> attached
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>>>>> sections.
>>>>>>>>>>>>>>>>>>>>>>> These
>>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
>> the
>>>>>>>>>>> UI.
>>>>>>>>>>>>> So
>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> parsing
>>>>>>>>>>>>>>>>>>>>> (
>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
>>>>>>>>>> the
>>>>>>>>>>>>>>> *root_dag
>>>>>>>>>>>>>>>>>>>> *instead
>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> *root_dag +
>>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
>>>>>>>>>>>>>>>>>> current_group=section-1,
>>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
>>>>>>>>>>> naming
>>>>>>>>>>>>>>>>>>> suggestions),
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
>>>>>>>>>>> nested
>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
>> something
>>>>>>>>>>>> like
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>> utilizing
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
>>>>>>>>>> in
>>>>>>>>>>>> some
>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
>>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
>>>>>>>>>>> complexity
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> SubDag
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
>>>>>>>>>> using
>>>>>>>>>>>>>> SubDag.
>>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
>>>>>>>>>>>>> reusable
>>>>>>>>>>>>>>> dag
>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
>> the
>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>>>>>>>>> (see
>>>>>>>>>>>>>>>>>>>>>>> AIP
>>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> generating 1
>>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
>>>>>>>>>>> (in
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
>>>>>>>>>>> root
>>>>>>>>>>>>>> dag).
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>> simpler
>>>>>>>>>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
>>>>>>>>>> out
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> contents
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> SubDag
>>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
>>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
>>>>>>>>>>>>>>>>>>>>>>> (forgive
>>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
>> is
>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>> necessary
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> keep the
>>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
>>>>>>>>>>>> name?
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>> Palmer
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> helping
>>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
>> I
>>>>>>>>>>>> will
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> paste
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
>>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
>>>>>>>>>> a
>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> either a
>>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
>>>>>>>>>> in
>>>>>>>>>>>> any
>>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> either
>>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>>>>>>>>>> as
>>>>>>>>>>> a
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>> "object",
>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>>>>>>>>>>> "status"
>>>>>>>>>>>>>>> of a
>>>>>>>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
>>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
>>>>>>>>>>> executor), I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
>>>>>>>>>> to
>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
>>>>>>>>>>> tasks
>>>>>>>>>>>>>> etc.)
>>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
>> pick
>>>>>>>>>>> up
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> individual
>>>>>>>>>>>>>>>>>>>>>> tasks'
>>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
>>>>>>>>>> status
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
>>>>>>>>>> Imberman
>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
>> operator
>>>>>>>>>>> to
>>>>>>>>>>>>> tie
>>>>>>>>>>>>>>> dags
>>>>>>>>>>>>>>>>>>>> together
>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
>> we
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> essentially
>>>>>>>>>>>>>>>>>>>>> write
>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
>>>>>>>>>>>> starter-tasks
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
>>>>>>>>>> UI
>>>>>>>>>>>>>> concept.
>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>> doesn’t
>>>>>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
>> more
>>>>>>>>>>>> tasks
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> queue
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
>>>>>>>>>> available.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>>>>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>> chris@crpalmer.com
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
>>>>>>>>>>>>>> abstraction.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
>>>>>>>>>> high
>>>>>>>>>>>>> level
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>> functionality:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
>> a
>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> either
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
>> in
>>>>>>>>>>> any
>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> either
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>>>>>>>>>> as a
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>> "object",
>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>>>>>>>>>> "status"
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
>>>>>>>>>>> object
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
>>>>>>>>>>>> tasks.
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
>>>>>>>>>> point
>>>>>>>>>>> of
>>>>>>>>>>>>>> view
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> DAG
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
>>>>>>>>>> differently.
>>>>>>>>>>> So
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
>> sets
>>>>>>>>>>> of
>>>>>>>>>>>>>> Tasks,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
>>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
>>>>>>>>>> the
>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> important
>>>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> fix),
>>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
>>>>>>>>>>> right
>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>>>> (just
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
>>>>>>>>>>> adding
>>>>>>>>>>>>>>> visual
>>>>>>>>>>>>>>>>>>> grouping
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> UI).
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
>>>>>>>>>>> with
>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> why
>>>>>>>>>>>>>>>>>>>>>>>>> subdags
>>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
>>>>>>>>>>>>>>>>>>>>>> . A
>>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
>>>>>>>>>> is
>>>>>>>>>>>> e.g.
>>>>>>>>>>>>>>>>> enabling
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
>>>>>>>>>>>> well. I
>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
>>>>>>>>>> UI
>>>>>>>>>>>> but
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> two
>>>>>>>>>>>>>>>>>>>>>> items
>>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
>>>>>>>>>>>>>> functionality.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
>>>>>>>>>> and
>>>>>>>>>>>>> they
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> always a
>>>>>>>>>>>>>>>>>>>>>> giant
>>>>>>>>>>>>>>>>>>>>>>>>> pain
>>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
>>>>>>>>>>>>> confusion
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> breakages
>>>>>>>>>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
>>>>>>>>>> Coder <
>>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
>>>>>>>>>> UI
>>>>>>>>>>>>>>> concept. I
>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
>>>>>>>>>>> you
>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
>>>>>>>>>> tasks
>>>>>>>>>>>>>> start,
>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
>>>>>>>>>>>> and I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
>>>>>>>>>> Hamlin
>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
>>>>>>>>>>>>>> Berlin-Taylor
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>> ash@apache.org
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
>>>>>>>>>>>> anymore?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
>>>>>>>>>>>>> replacing
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>>>>>>>>> grouping
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
>>>>>>>>>> to
>>>>>>>>>>>> get
>>>>>>>>>>>>>>>> wrong,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> closer
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
>>>>>>>>>>>> subdags?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
>>>>>>>>>>>> subdags
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
>>>>>>>>>> we
>>>>>>>>>>>> not
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>> _enitrely_
>>>>>>>>>>>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
>>>>>>>>>> it
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>> simpler.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
>>>>>>>>>>> haven't
>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>>>>> extensively
>>>>>>>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
>>>>>>>>>>>> has(?)
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> form
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
>>>>>>>>>> schedule_interval,
>>>>>>>>>>>> but
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> match
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> parent
>>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
>>>>>>>>>>>> (Does
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>> sense
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
>>>>>>>>>>> sub
>>>>>>>>>>>>> dag
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>>>>>>>>>>>> execute, so
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
>>>>>>>>>>>>> operator a
>>>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
>>>>>>>>>>>>>> Berlin-Taylor <
>>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
>>>>>>>>>>>>> excited
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>>>>>>>>>> parsing*:
>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>> rewrites
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>>>>>>>>>> parsing,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
>>>>>>>>>>>> already
>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>>> At
>>>>>>>>>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
>>>>>>>>>>>> Huang <
>>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
>>>>>>>>>>>> collect
>>>>>>>>>>>>>>>>> feedback
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
>>>>>>>>>>>>>> previously
>>>>>>>>>>>>>>>>>> briefly
>>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
>>>>>>>>>>> done
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> Airflow
>>>>>>>>>>>>>>>>>>> 2.0,
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> one of
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
>>>>>>>>>>> attach
>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>> back
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> root
>>>>>>>>>>>>>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
>>>>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>>>>>> related
>>>>>>>>>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
>>>>>>>>>> while
>>>>>>>>>>>>>>> respecting
>>>>>>>>>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
>>>>>>>>>> effect
>>>>>>>>>>>> on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
>>>>>>>>>>>> function
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>> reusable
>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
>>>>>>>>>>>>>>> child_dag_name
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>>>>>>>>>> parsing*:
>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>> rewrites
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>>>>>>>>>> parsing,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>> give a
>>>>>>>>>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
>>>>>>>>>> new
>>>>>>>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>>>>>>>>> acts
>>>>>>>>>>>>>>>>>>>>>>> like a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
>>>>>>>>>>>>> methods
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> removed.
>>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
>>>>>>>>>> *with
>>>>>>>>>>>>>>>>> *subdag_args
>>>>>>>>>>>>>>>>>>> *and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
>>>>>>>>>> PythonOperator
>>>>>>>>>>>>>>>> signature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
>>>>>>>>>>>>>>> current_group
>>>>>>>>>>>>>>>> &
>>>>>>>>>>>>>>>>>>>>>> parent_group
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
>>>>>>>>>>> used
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
>>>>>>>>>>>>> further
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> group
>>>>>>>>>>>>>>>>>>>>>>> arbitrary
>>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>> group-level
>>>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
>>>>>>>>>>> the
>>>>>>>>>>>>>> dag)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
>>>>>>>>>> Proposed
>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>>> modification
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>> structure
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> pair
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
>>>>>>>>>>>>> hierarchical
>>>>>>>>>>>>>>>>>>> structure.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
>>>>>>>>>> PRs
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> details:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
>>>>>>>>>>>>> aspects
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
>>>>>>>>>>> the
>>>>>>>>>>>>>> third
>>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>>>>>>> regarding
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
>>>>>>>>>>>> looking
>>>>>>>>>>>>>>>> forward
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> it!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
>>>>>>>>>>>>>>>>>>>>>>> Poornima
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> Software
>>>>>> Engineer
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
>> <+48660796129
>>>>>>>>>>>>> <+48%20660%20796%20129>>
>>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>>>>> Engineer
>>>>>>>>>>>>> 
>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
>>>>>>>>>>>>> <+48%20660%20796%20129>>
>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> *Jacob Ferriero*
>>>>>>>>>>>> 
>>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
>>>>>>>>>>>> 
>>>>>>>>>>>> jferriero@google.com
>>>>>>>>>>>> 
>>>>>>>>>>>> 617-714-2509
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
Is it not possible to solve this at the UI level? Aka tell dagre to only
add 1 edge to the group instead of to all nodes in the group? No need to do
SubDag behaviour, but just reduce the edges on the graph. Should reduce
load time if I understand correctly.

I would strongly avoid the Dummy operator since it will introduce delays on
operator execution (as it will need to execute 1 dummy operator and that
can be expensive imo).

Overall though proposal looks good, unless anyone opposes it, I would move
this to vote mode :D

Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com> wrote:

> Hi, All,
> Here's the updated AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >.
> The PR has been fine-tuned with better UI interactions and added
> serialization of TaskGroup: https://github.com/apache/airflow/pull/10153
>
> Here's some experiment results:
> A made up dag containing 403 tasks, and 5696 edges. Grouped like this. Note
> there's a inside_section_2 is intentionally made to depend on all tasks
> in inside_section_1 to generate a large number of edges. The observation is
> that opening the top level graph is very quick, around 270ms. Expanding
> groups that don't have a lot of dense dependencies on other groups are also
> hardly noticeable. E.g expanding section_1 takes 330ms. The part that takes
> time is when expanding both groups inside_section_1 and inside_section_2
> Because there are 2500 edges between these two inner groups, it took 63
> seconds to expand both of them. Majority of the time (more than 62seconds)
> is actually taken by the layout() function in dagre. In other words, it's
> very fast to add nodes and edges, but laying them out on the graph takes
> time. This issue is not actually a problem specific to TaskGroup. Without
> TaskGroup, if a DAG contains too many edges, it takes time to layout the
> graph too.
>
> On the other hand, a more realistic experiment with production DAG
> containing about 400 tasks and 700 edges showed that grouping tasks into
> three levels of nested TaskGroup cut the upfront page opening time from
> around 6s to 500ms. (Obviously the time is paid back when user gradually
> expands all the groups one by one, but normally people don't need to expand
> every group every time so it's still a big saving). The experiments are
> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
>
> I can see a few possible improvements to TaskGroup (or how it's used) that
> can be done as a next-step:
> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> displaying the whole DAG, we can limit the Graph View to show only a single
> TaskGroup, omitting its edges going out to other TaskGroups. This behaviour
> is more like SubDagOperator where users can zoom into/out of a TaskGroup
> and look at only tasks within that TaskGroup as if those are the only tasks
> on the DAG. This can be done with either background javascript calls or by
> making a new get request with filtering parameters. Obviously the downside
> is that it's not as explicit as showing all the dependencies on the graph.
> 2). Users can improve the organization of the DAG themselves to reduce the
> number of edges. E.g. if every task in group2 depends on every tasks in
> group1, instead of doing group1 >> group2, they can add a DummyOperator in
> between and do this: group1 >> dummy >> group2. This cuts down the number
> of edges significantly and page load becomes much faster.
> 3). If we really want, we can improve the >> operator of TaskGroup to do 2)
> automatically. If it sees that both sides of >> are TaskGroup, it can
> create a DummyOperator on behalf of the user. The downside is that it may
> be too much magic.
>
> Thanks,
> Qian
>
> def create_section():
> """
> Create tasks in the outer section.
> """
> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]
>
> with TaskGroup("inside_section_1") as inside_section_1:
> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>
> with TaskGroup("inside_section_2") as inside_section_2:
> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
>
> dummies[-1] >> inside_section_1
> dummies[-2] >> inside_section_2
> inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> start = DummyOperator(task_id="start")
>
> with TaskGroup("section_1") as section_1:
> create_section()
>
> some_other_task = DummyOperator(task_id="some-other-task")
>
> with TaskGroup("section_2") as section_2:
> create_section()
>
> end = DummyOperator(task_id='end')
>
> start >> section_1 >> some_other_task >> section_2 >> end
>
>
> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> <gc...@twitter.com.invalid> wrote:
>
> > Re graph times. That makes sense. Let me know what you find. We may be
> able
> > to contribute on the lazy loading part.
> >
> > Looking forward to see the updated AIP!
> >
> >
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> > On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Permissions granted, let me know if you face any issues.
> > >
> > > On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:
> > >
> > > > Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > > >
> > > > On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > >
> > > > > What's your ID i.e. if you haven't created an account yet, please
> > > create
> > > > > one at https://cwiki.apache.org/confluence/signup.action and send
> us
> > > > your
> > > > > ID and we will add permissions.
> > > > >
> > > > > Thanks. I'll edit the AIP. May I request permission to edit it?
> > > > > > My wiki user email is yuqian1990@gmail.com.
> > > > >
> > > > >
> > > > > On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> to
> > > edit
> > > > > it?
> > > > > > My wiki user email is yuqian1990@gmail.com.
> > > > > >
> > > > > > Re Gerard: yes the UI loads all the nodes as json from the web
> > server
> > > > at
> > > > > > once. However, it only adds the top level nodes and edges to the
> > > graph
> > > > > when
> > > > > > the Graph View page is first opened. And then adds the expanded
> > nodes
> > > > to
> > > > > > the graph as the user expands them. From what I've experienced
> with
> > > > DAGs
> > > > > > containing around 400 tasks (not using TaskGroup or
> > SubDagOperator),
> > > > > > opening the whole dag in Graph View usually takes 5 seconds. Less
> > > than
> > > > > 60ms
> > > > > > of that is taken by loading the data from webserver. The
> remaining
> > > > 4.9s+
> > > > > is
> > > > > > taken by javascript functions in dagre-d3.min.js such as
> > createNodes,
> > > > > > createEdgeLabels, etc and by rendering the graph. With TaskGroup
> > > being
> > > > > used
> > > > > > to group tasks into a smaller number of top-level nodes, the
> amount
> > > of
> > > > > data
> > > > > > loaded from webserver will remain about the same compared to a
> flat
> > > dag
> > > > > of
> > > > > > the same size, but the number of nodes and edges needed to be
> plot
> > on
> > > > the
> > > > > > graph can be reduced significantly. So in theory this should
> speed
> > up
> > > > the
> > > > > > time it takes to open Graph View even without lazy-loading the
> data
> > > > (I'll
> > > > > > experiment to find out). That said, if it comes to a point
> > > lazy-loading
> > > > > > helps, we can still implement it as an improvement.
> > > > > >
> > > > > > Re James: the Tree View looks as if all all the groups are fully
> > > > > expanded.
> > > > > > (because under the hood all the tasks are in a single DAG). I'm
> > less
> > > > > > worried about Tree View at the moment because it already has a
> > > > mechanism
> > > > > > for collapsing tasks by the dependency tree. That said, the Tree
> > View
> > > > can
> > > > > > definitely be improved too with TaskGroup. (e.g. collapse tasks
> in
> > > the
> > > > > same
> > > > > > TaskGroup when Tree View is first opened).
> > > > > >
> > > > > > For both suggestions, implementing them don't require fundamental
> > > > changes
> > > > > > to the idea. I think we can have a basic working TaskGroup first,
> > and
> > > > > then
> > > > > > improve it incrementally in several PRs as we get more feedback
> > from
> > > > the
> > > > > > community. What do you think?
> > > > > >
> > > > > > Qian
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I agree this looks great, one question, how does the tree view
> > > look?
> > > > > > >
> > > > > > > James Coder
> > > > > > >
> > > > > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > > gcasassaez@twitter.com
> > > > > > .invalid>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > First of all, this is awesome!!
> > > > > > > >
> > > > > > > > Secondly, checking your UI code, seems you are loading all
> > > > operators
> > > > > at
> > > > > > > > once. Wondering if we can load them as needed (aka load
> > whenever
> > > we
> > > > > > click
> > > > > > > > the TaskGroup). Some of our DAGs are so large that take
> forever
> > > to
> > > > > load
> > > > > > > on
> > > > > > > > the Graph view, so worried about this still being an issue
> > here.
> > > It
> > > > > may
> > > > > > > be
> > > > > > > > easily solvable by implementing lazy loading of the graph.
> Not
> > > sure
> > > > > how
> > > > > > > > easy to implement/add to the UI extension (and dont want to
> > push
> > > > for
> > > > > > > early
> > > > > > > > optimization as its the root of all evil).
> > > > > > > > Gerard Casas Saez
> > > > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > > bin.huangxb@gmail.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi Yu,
> > > > > > > >>
> > > > > > > >> Thank you so much for taking on this. I was fairly
> distracted
> > > > > > previously
> > > > > > > >> and I didn't have the time to update the proposal. In fact,
> > > after
> > > > > > > >> discussing with Ash, Kaxil and Daniel, the direction of this
> > AIP
> > > > has
> > > > > > > been
> > > > > > > >> changed to favor the concept of TaskGroup instead of
> rewriting
> > > > > > > >> SubDagOperator (though it may may sense to deprecate SubDag
> > in a
> > > > > > future
> > > > > > > >> date.).
> > > > > > > >>
> > > > > > > >> Your PR is amazing and it has implemented the desire
> > features. I
> > > > > think
> > > > > > > we
> > > > > > > >> can focus on your new PR instead. Do you mind updating the
> AIP
> > > > based
> > > > > > on
> > > > > > > >> what you have done in your PR?
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Bin
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > yuqian1990@gmail.com>
> > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> Hi, all, I've added the basic UI changes to my proposed
> > > > > > implementation
> > > > > > > of
> > > > > > > >>> TaskGroup as UI grouping concept:
> > > > > > > >>> https://github.com/apache/airflow/pull/10153
> > > > > > > >>>
> > > > > > > >>> I think Chris had a pretty good specification of TaskGroup
> so
> > > i'm
> > > > > > > quoting
> > > > > > > >>> it here. The only thing I don't fully agree with is the
> > > > restriction
> > > > > > > >>> "... **cannot*
> > > > > > > >>> have dependencies between a Task in a TaskGroup and either
> a*
> > > > > > > >>> *   Task in a different TaskGroup or a Task not in any
> > > group*". I
> > > > > > think
> > > > > > > >>> this is over restrictive. Since TaskGroup is a UI concept,
> > > tasks
> > > > > can
> > > > > > > have
> > > > > > > >>> dependencies on tasks in other TaskGroup or not in any
> > > TaskGroup.
> > > > > In
> > > > > > my
> > > > > > > >> PR,
> > > > > > > >>> this is allowed. The graph edges will update accordingly
> when
> > > > > > > TaskGroups
> > > > > > > >>> are expanded/collapsed. TaskGroup is only helping to make
> the
> > > UI
> > > > > look
> > > > > > > >> less
> > > > > > > >>> crowded. Under the hood, everything is still a DAG of tasks
> > and
> > > > > edges
> > > > > > > so
> > > > > > > >>> things work normally. Here's a screenshot
> > > > > > > >>> <
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > > > >>>>
> > > > > > > >>> of the UI interaction.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > > > > dependencies
> > > > > > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> > > > > dependencies
> > > > > > > >>> between a Task in a TaskGroup and either a   Task in a
> > > different
> > > > > > > >> TaskGroup
> > > > > > > >>> or a Task not in any group   - You *can* have dependencies
> > > > between
> > > > > a
> > > > > > > >>> TaskGroup and either other   TaskGroups or Tasks not in any
> > > group
> > > > >  -
> > > > > > > The
> > > > > > > >>> UI will by default render a TaskGroup as a single "object",
> > but
> > > > > >  which
> > > > > > > >> you
> > > > > > > >>> expand or zoom into in some way   - You'd need some way to
> > > > > determine
> > > > > > > what
> > > > > > > >>> the "status" of a TaskGroup was   at least for UI display
> > > > purposes*
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> Regarding Jake's comment, I agree it's possible to
> implement
> > > the
> > > > > > > >> "retrying
> > > > > > > >>> tasks in a group" pattern he mentioned as an optional
> feature
> > > of
> > > > > > > >> TaskGroup
> > > > > > > >>> although that may go against having TaskGroup as a pure UI
> > > > concept.
> > > > > > For
> > > > > > > >> the
> > > > > > > >>> motivating example Jake provided, I suggest implementing
> both
> > > > > > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> single
> > > > > > operator.
> > > > > > > It
> > > > > > > >>> can do something like BaseSensorOperator.execute() does in
> > > > > > "reschedule"
> > > > > > > >>> mode, i.e. it first executes some code to submit the long
> > > running
> > > > > job
> > > > > > > to
> > > > > > > >>> the external service, and store the state (e.g. in XCom).
> > Then
> > > > > > > reschedule
> > > > > > > >>> itself. Subsequent runs then pokes for the completion
> state.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > > > >> <jferriero@google.com.invalid
> > > > > > > >>>>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> I really like this idea of a TaskGroup container as I
> think
> > > this
> > > > > > will
> > > > > > > >> be
> > > > > > > >>>> much easier to use than SubDag.
> > > > > > > >>>>
> > > > > > > >>>> I'd like to propose an optional behavior for special retry
> > > > > mechanics
> > > > > > > >> via
> > > > > > > >>> a
> > > > > > > >>>> TaskGroup.retry_all property.
> > > > > > > >>>> This way I could use TaskGroup to replace my favorite use
> of
> > > > > SubDag
> > > > > > > for
> > > > > > > >>>> atomically retrying tasks of the pattern "act on external
> > > state
> > > > > then
> > > > > > > >>>> reschedule poll until desired state reached".
> > > > > > > >>>>
> > > > > > > >>>> Motivating use case I have for a SubDag is very simple two
> > > task
> > > > > > group
> > > > > > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > > > >>>> I use SubDag is because it gives me an easy way to retry
> the
> > > > > > > >>> SubmitJobTask
> > > > > > > >>>> if something about the PollJobSensor fails.
> > > > > > > >>>> This pattern would be really nice for jobs that are
> expected
> > > to
> > > > > run
> > > > > > a
> > > > > > > >>> long
> > > > > > > >>>> time (because we can use sensor can use reschedule mode
> > > freeing
> > > > up
> > > > > > > >> slots)
> > > > > > > >>>> but might fail for a retryable reason.
> > > > > > > >>>> However, using SubDag to meet this use case defeats the
> > > purpose
> > > > > > > because
> > > > > > > >>>> SubDag infamously
> > > > > > > >>>> <
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > > > >>>>>
> > > > > > > >>>> blocks a "controller" slot for the entire duration.
> > > > > > > >>>> This may feel like a cyclic behavior but reality it is
> very
> > > > common
> > > > > > for
> > > > > > > >> a
> > > > > > > >>>> single operator to submit job / wait til done.
> > > > > > > >>>> We could use this case refactor many operators (e.g. BQ,
> > > > Dataproc,
> > > > > > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > PollTask]
> > > > > > with
> > > > > > > >> an
> > > > > > > >>>> optional reschedule mode if user knows that this job may
> > take
> > > a
> > > > > long
> > > > > > > >>> time.
> > > > > > > >>>>
> > > > > > > >>>> I'd be happy to the development work on adding this
> specific
> > > > retry
> > > > > > > >>> behavior
> > > > > > > >>>> to TaskGroup once the base concept is implemented if
> others
> > in
> > > > the
> > > > > > > >>>> community would find this a useful feature.
> > > > > > > >>>>
> > > > > > > >>>> Cheers,
> > > > > > > >>>> Jake
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > > > Jarek.Potiuk@polidea.com
> > > > > > > >>>
> > > > > > > >>>> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> All for it :) . I think we are getting closer to have
> > regular
> > > > > > > >> planning
> > > > > > > >>>> and
> > > > > > > >>>>> making some structured approach to 2.0 and starting task
> > > force
> > > > > for
> > > > > > it
> > > > > > > >>>> soon,
> > > > > > > >>>>> so I think this should be perfectly fine to discuss and
> > even
> > > > > start
> > > > > > > >>>>> implementing what's beyond as soon as we make sure that
> we
> > > are
> > > > > > > >>>> prioritizing
> > > > > > > >>>>> 2.0 work.
> > > > > > > >>>>>
> > > > > > > >>>>> J,
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > yuqian1990@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>>>>
> > > > > > > >>>>>> Hi Jarek,
> > > > > > > >>>>>>
> > > > > > > >>>>>> I agree we should not change the behaviour of the
> existing
> > > > > > > >>>> SubDagOperator
> > > > > > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > > about
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> as
> > > > > > > >>>>>> a brand new concept/feature independent from the
> existing
> > > > > > > >>>> SubDagOperator?
> > > > > > > >>>>>> In other words, shall we add TaskGroup as a UI grouping
> > > > concept
> > > > > > > >> like
> > > > > > > >>>> Ash
> > > > > > > >>>>>> suggested, and not touch SubDagOperator atl all.
> Whenever
> > we
> > > > are
> > > > > > > >>> ready
> > > > > > > >>>>> with
> > > > > > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > 2.1.
> > > > > > > >>>>>>
> > > > > > > >>>>>> I really like Ash's idea of simplifying the
> SubDagOperator
> > > > idea
> > > > > > > >> into
> > > > > > > >>> a
> > > > > > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > > "reattaching
> > > > > > > >> all
> > > > > > > >>>> the
> > > > > > > >>>>>> tasks to the root DAG" is the way to go. And I see James
> > > > pointed
> > > > > > > >> out
> > > > > > > >>> we
> > > > > > > >>>>>> need some helper functions to simplify dependencies
> > setting
> > > of
> > > > > > > >>>> TaskGroup.
> > > > > > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > > having
> > > > > > > >>>> TaskGroup
> > > > > > > >>>>> as
> > > > > > > >>>>>> a UI concept should be a relatively small change. We can
> > > > > simplify
> > > > > > > >>>>> Xinbin's
> > > > > > > >>>>>> PR further. So I put up this alternative proposal here:
> > > > > > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > > > > > >>>>>>
> > > > > > > >>>>>> I have not done any UI changes due to lack of experience
> > > with
> > > > > web
> > > > > > > >> UI.
> > > > > > > >>>> If
> > > > > > > >>>>>> anyone's interested, please take a look at the PR.
> > > > > > > >>>>>>
> > > > > > > >>>>>> Qian
> > > > > > > >>>>>>
> > > > > > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > > > >>> Jarek.Potiuk@polidea.com
> > > > > > > >>>>>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>
> > > > > > > >>>>>>> Similar point here to the other ideas that are popping
> > up.
> > > > > Maybe
> > > > > > > >> we
> > > > > > > >>>>>> should
> > > > > > > >>>>>>> just focus on completing 2.0 and make all discussions
> > about
> > > > > > > >> further
> > > > > > > >>>>>>> improvements to 2.1? While those are important
> > discussions
> > > > (and
> > > > > > > >> we
> > > > > > > >>>>> should
> > > > > > > >>>>>>> continue them in the  near future !) I think at this
> > point
> > > > > > > >> focusing
> > > > > > > >>>> on
> > > > > > > >>>>>>> delivering 2.0 in its current shape should be our focus
> > > now ?
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> J.
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > > > >>> bin.huangxb@gmail.com>
> > > > > > > >>>>>>> wrote:
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>> Hi Daniel
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> I agree that the TaskGroup should have the same API
> as a
> > > DAG
> > > > > > > >>> object
> > > > > > > >>>>>>> related
> > > > > > > >>>>>>>> to task dependencies, but it will not have anything
> > > related
> > > > to
> > > > > > > >>>> actual
> > > > > > > >>>>>>>> execution or scheduling.
> > > > > > > >>>>>>>> I will update the AIP according to this over the
> > weekend.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> you
> > > > > > > >> import
> > > > > > > >>>> the
> > > > > > > >>>>>>> object
> > > > > > > >>>>>>>> you can import it with parameters to determine the
> shape
> > > of
> > > > > the
> > > > > > > >>>> DAG.
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > > > similar
> > > > > > > >>>> purpose
> > > > > > > >>>>>> as
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>> DAG factory function?
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > > > >>>>>>> daniel.imberman@gmail.com
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG
> object
> > > > (e.g.
> > > > > > > >>> the
> > > > > > > >>>>>>> bitwise
> > > > > > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > > > > > >>>> “DAGTemplate”
> > > > > > > >>>>>>>> object
> > > > > > > >>>>>>>>> s.t. when you import the object you can import it
> with
> > > > > > > >>> parameters
> > > > > > > >>>>> to
> > > > > > > >>>>>>>>> determine the shape of the DAG.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > > > >>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>>> wrote:
> > > > > > > >>>>>>>>> The TaskGroup will not take schedule interval as a
> > > > parameter
> > > > > > > >>>>> itself,
> > > > > > > >>>>>>> and
> > > > > > > >>>>>>>> it
> > > > > > > >>>>>>>>> depends on the DAG where it attaches to. In my
> opinion,
> > > the
> > > > > > > >>>>> TaskGroup
> > > > > > > >>>>>>>> will
> > > > > > > >>>>>>>>> only contain a group of tasks with interdependencies,
> > and
> > > > the
> > > > > > > >>>>>> TaskGroup
> > > > > > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > > > > > >>> execution/scheduling
> > > > > > > >>>>>> logic
> > > > > > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > > etc.)
> > > > > > > >>> like
> > > > > > > >>>> a
> > > > > > > >>>>>> DAG
> > > > > > > >>>>>>>>> does.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > > > > > >> interval
> > > > > > > >>>> of
> > > > > > > >>>>>> DAG
> > > > > > > >>>>>>> is
> > > > > > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > min.
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> I am curious why you ask this. Is this a use case
> that
> > > you
> > > > > > > >> want
> > > > > > > >>>> to
> > > > > > > >>>>>>>> achieve?
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> Bin
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > > > >> thanosxnicholas@gmail.com
> > > > > > > >>>>
> > > > > > > >>>>>> wrote:
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of
> TaskGroup
> > > the
> > > > > > > >>> same
> > > > > > > >>>>> as
> > > > > > > >>>>>>> the
> > > > > > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > > > > > >> interval
> > > > > > > >>> of
> > > > > > > >>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>> could be different with that of the DAG? For
> example,
> > > > there
> > > > > > > >>> is
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>> scenario
> > > > > > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > > > > > >> schedule
> > > > > > > >>>>>> interval
> > > > > > > >>>>>>>> of
> > > > > > > >>>>>>>>>> TaskGroup is 20 min.
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Cheers,
> > > > > > > >>>>>>>>>> Nicholas
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > > > >>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>>> Hi Nicholas,
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> I am not sure about the old behavior of
> > SubDagOperator,
> > > > > > > >>> maybe
> > > > > > > >>>>> it
> > > > > > > >>>>>>> will
> > > > > > > >>>>>>>>>> throw
> > > > > > > >>>>>>>>>>> an error? But in the original proposal, the
> subdag's
> > > > > > > >>>>>>>> schedule_interval
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > replace
> > > > > > > >>>> SubDag,
> > > > > > > >>>>>>> there
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be no subdag schedule_interval.
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > > > >>>> thanosxnicholas@gmail.com
> > > > > > > >>>>>>
> > > > > > > >>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Hi Bin,
> > > > > > > >>>>>>>>>>>> Thanks for your good proposal. I was confused
> > whether
> > > > > > > >> the
> > > > > > > >>>>>>> schedule
> > > > > > > >>>>>>>>>>>> interval of SubDAG is different from that of the
> > > parent
> > > > > > > >>>> DAG?
> > > > > > > >>>>> I
> > > > > > > >>>>>>> have
> > > > > > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > interval
> > > > > > > >>> of
> > > > > > > >>>>>>> SubDAG.
> > > > > > > >>>>>>>> If
> > > > > > > >>>>>>>>>> the
> > > > > > > >>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > what
> > > > > > > >>> will
> > > > > > > >>>>>>> happen
> > > > > > > >>>>>>>>> for
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> Regards,
> > > > > > > >>>>>>>>>>>> Nicholas Jiang
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > > > >>>>>>>> bin.huangxb@gmail.com>
> > > > > > > >>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> I have rethought about the concept of subdag and
> > task
> > > > > > > >>>>>> groups. I
> > > > > > > >>>>>>>>> think
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > > > > > >>> subdag
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>> introduce
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of
> tasks
> > > > > > > >>> along
> > > > > > > >>>>>> with
> > > > > > > >>>>>>>>> their
> > > > > > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > as a
> > > > > > > >>>> DAG*.
> > > > > > > >>>>>> The
> > > > > > > >>>>>>>>> only
> > > > > > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> you
> > > > > > > >>> still
> > > > > > > >>>>> need
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>> add
> > > > > > > >>>>>>>>>> it
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> a DAG for execution.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Here is a small code snippet.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> ```
> > > > > > > >>>>>>>>>>>>> class TaskGroup:
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> If default_args is missing, it will take default
> > args
> > > > > > > >>>> from
> > > > > > > >>>>>> the
> > > > > > > >>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > > > >>>>>>>>>>>>> pass
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> You can add tasks to a task group similar to
> adding
> > > > > > > >>> tasks
> > > > > > > >>>>> to
> > > > > > > >>>>>> a
> > > > > > > >>>>>>>> DAG
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> This can be declared in a separate file from the
> > dag
> > > > > > > >>> file
> > > > > > > >>>>>>>>>>>>> """
> > > > > > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > > > > >>>>>>>>>>>> default_args=default_args)
> > > > > > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > > > > > >>>>>>>>>>>>> task2.dag = download_group
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> with download_group:
> > > > > > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> [task, task2] >> task3
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > > > >>>>>>> default_args=default_args,
> > > > > > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > > > >>>>>>>>>>>>> start >> download_group
> > > > > > > >>>>>>>>>>>>> # this is equivalent to
> > > > > > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > > > >>>>>>>>>>>>> ```
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks
> and
> > > > > > > >> set
> > > > > > > >>>>>>>> dependencies
> > > > > > > >>>>>>>>>>>> between
> > > > > > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > > > > >>>>>> SubDagOperator,
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>> we
> > > > > > > >>>>>>>>>>>> can
> > > > > > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > task`.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > > > > >> Airflow
> > > > > > > >>>> 2.0
> > > > > > > >>>>>> and
> > > > > > > >>>>>>>>> allow
> > > > > > > >>>>>>>>>>>>> gradual transition. Then we can decide if we
> still
> > > > > > > >> want
> > > > > > > >>>> to
> > > > > > > >>>>>> keep
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Any thoughts?
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> Cheers,
> > > > > > > >>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> Beauchemin <
> > > > > > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > > > > > >>> groups
> > > > > > > >>>>> and
> > > > > > > >>>>>> a
> > > > > > > >>>>>>>>>>>> zoom-in/out
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > DAG
> > > > > > > >>>>> object
> > > > > > > >>>>>>>> since
> > > > > > > >>>>>>>>> it
> > > > > > > >>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > > > > > >>> create
> > > > > > > >>>>>>>> underlying
> > > > > > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > > > > > >> group
> > > > > > > >>>> of
> > > > > > > >>>>>>> tasks.
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> Max
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> Thank you for your email.
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > > > > > >>>>>> rewrites
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > > > > >> it
> > > > > > > >>>>> will
> > > > > > > >>>>>>>> give a
> > > > > > > >>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > > > >> does
> > > > > > > >>>>> this I
> > > > > > > >>>>>>>>> think.
> > > > > > > >>>>>>>>>> At
> > > > > > > >>>>>>>>>>>>> least
> > > > > > > >>>>>>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > > > >>> representation,
> > > > > > > >>>>> but
> > > > > > > >>>>>> at
> > > > > > > >>>>>>>>> least
> > > > > > > >>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > > > > > >> In
> > > > > > > >>> my
> > > > > > > >>>>>>>> proposal
> > > > > > > >>>>>>>>> as
> > > > > > > >>>>>>>>>>>> also
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > > > > >> from
> > > > > > > >>>> the
> > > > > > > >>>>>>> subdag
> > > > > > > >>>>>>>>> and
> > > > > > > >>>>>>>>>>> add
> > > > > > > >>>>>>>>>>>>>> them
> > > > > > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> graph
> > > > > > > >>>> will
> > > > > > > >>>>>> look
> > > > > > > >>>>>>>>>> exactly
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > > > >> attached
> > > > > > > >>>> to
> > > > > > > >>>>>>> those
> > > > > > > >>>>>>>>>>>> sections.
> > > > > > > >>>>>>>>>>>>>>> These
> > > > > > > >>>>>>>>>>>>>>>> metadata will be later on used to render in
> the
> > > > > > > >>> UI.
> > > > > > > >>>>> So
> > > > > > > >>>>>>>> after
> > > > > > > >>>>>>>>>>>> parsing
> > > > > > > >>>>>>>>>>>>> (
> > > > > > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > > > > > >> the
> > > > > > > >>>>>>> *root_dag
> > > > > > > >>>>>>>>>>>> *instead
> > > > > > > >>>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>>> *root_dag +
> > > > > > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > > > >>>>>>>>>> current_group=section-1,
> > > > > > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > > > > >>> naming
> > > > > > > >>>>>>>>>>> suggestions),
> > > > > > > >>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > > > > >>> nested
> > > > > > > >>>>>> group
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> still
> > > > > > > >>>>>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> While at the UI, what we see would be
> something
> > > > > > > >>>> like
> > > > > > > >>>>>> this
> > > > > > > >>>>>>>> by
> > > > > > > >>>>>>>>>>>>> utilizing
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > > > > > >> in
> > > > > > > >>>> some
> > > > > > > >>>>>>> way.
> > > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > > > >>> complexity
> > > > > > > >>>> of
> > > > > > > >>>>>>>> SubDag
> > > > > > > >>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>>>> execution
> > > > > > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > > > >> using
> > > > > > > >>>>>> SubDag.
> > > > > > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > > > > > >>>>> reusable
> > > > > > > >>>>>>> dag
> > > > > > > >>>>>>>>> code
> > > > > > > >>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>> declare dependencies between them. And with
> the
> > > > > > > >>> new
> > > > > > > >>>>>>>>>>> SubDagOperator
> > > > > > > >>>>>>>>>>>>> (see
> > > > > > > >>>>>>>>>>>>>>> AIP
> > > > > > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > > > > > >>>>> function
> > > > > > > >>>>>>> for
> > > > > > > >>>>>>>>>>>>> generating 1
> > > > > > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > > > > > >>> (in
> > > > > > > >>>>> this
> > > > > > > >>>>>>>> case,
> > > > > > > >>>>>>>>>> it
> > > > > > > >>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > > > > > >>> root
> > > > > > > >>>>>> dag).
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > > > > > >>>> with a
> > > > > > > >>>>>>>>>> simpler
> > > > > > > >>>>>>>>>>>>>> concept
> > > > > > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > > > > >> out
> > > > > > > >>>> the
> > > > > > > >>>>>>>>>> contents
> > > > > > > >>>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>> SubDag
> > > > > > > >>>>>>>>>>>>>>>> and becomes more like
> > > > > > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > >>>>>>>>>>>>>>> (forgive
> > > > > > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> is
> > > > > > > >>>> still
> > > > > > > >>>>>>>>>>> necessary
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>> keep the
> > > > > > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > > > > > >>>> name?
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > > > > > >>>> Chris
> > > > > > > >>>>>>> Palmer
> > > > > > > >>>>>>>>> for
> > > > > > > >>>>>>>>>>>>> helping
> > > > > > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> I
> > > > > > > >>>> will
> > > > > > > >>>>>> just
> > > > > > > >>>>>>>>> paste
> > > > > > > >>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>> here.
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > > > > >> in
> > > > > > > >>>> the
> > > > > > > >>>>>> same
> > > > > > > >>>>>>>>>>>> TaskGroup,
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > > > > > >> a
> > > > > > > >>>>>>> TaskGroup
> > > > > > > >>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>> either a
> > > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > > > > >> in
> > > > > > > >>>> any
> > > > > > > >>>>>>> group
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> and
> > > > > > > >>>>>>>>>> either
> > > > > > > >>>>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > > >> as
> > > > > > > >>> a
> > > > > > > >>>>>> single
> > > > > > > >>>>>>>>>>>> "object",
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > > >>>>> "status"
> > > > > > > >>>>>>> of a
> > > > > > > >>>>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>>> was
> > > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > > > > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > > > >>> executor), I
> > > > > > > >>>>>> think
> > > > > > > >>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>>> should
> > > > > > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > > > > > >> to
> > > > > > > >>>>>>> implement
> > > > > > > >>>>>>>>>> some
> > > > > > > >>>>>>>>>>>>>> metadata
> > > > > > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > > > > >>> tasks
> > > > > > > >>>>>> etc.)
> > > > > > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> pick
> > > > > > > >>> up
> > > > > > > >>>>> the
> > > > > > > >>>>>>>>>> individual
> > > > > > > >>>>>>>>>>>>>> tasks'
> > > > > > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > > > >> status
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > > > >> Imberman
> > > > > > > >>> <
> > > > > > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> operator
> > > > > > > >>> to
> > > > > > > >>>>> tie
> > > > > > > >>>>>>> dags
> > > > > > > >>>>>>>>>>>> together
> > > > > > > >>>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>> I
> > > > > > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> we
> > > > > > > >>>> could
> > > > > > > >>>>>>>>>> essentially
> > > > > > > >>>>>>>>>>>>> write
> > > > > > > >>>>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > > > >>>> starter-tasks
> > > > > > > >>>>>> for
> > > > > > > >>>>>>>>> that
> > > > > > > >>>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > > > > > >> UI
> > > > > > > >>>>>> concept.
> > > > > > > >>>>>>>> It
> > > > > > > >>>>>>>>>>>> doesn’t
> > > > > > > >>>>>>>>>>>>>> need
> > > > > > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> more
> > > > > > > >>>> tasks
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> the
> > > > > > > >>>>>>>>>>> queue
> > > > > > > >>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>> will
> > > > > > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > > > >> available.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > >>>>>>>>>>>>>>>>> ]
> > > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > > > > > >> <
> > > > > > > >>>>>>>>>>> chris@crpalmer.com
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > > > > >>>>>> abstraction.
> > > > > > > >>>>>>> I
> > > > > > > >>>>>>>>>> think
> > > > > > > >>>>>>>>>>>> what
> > > > > > > >>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > > > > >> high
> > > > > > > >>>>> level
> > > > > > > >>>>>> I
> > > > > > > >>>>>>>>> think
> > > > > > > >>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>> want
> > > > > > > >>>>>>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>>> functionality:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> in
> > > > > > > >>> the
> > > > > > > >>>>>> same
> > > > > > > >>>>>>>>>>> TaskGroup,
> > > > > > > >>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> a
> > > > > > > >>>>>> TaskGroup
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> either
> > > > > > > >>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> in
> > > > > > > >>> any
> > > > > > > >>>>>> group
> > > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > > >>> TaskGroup
> > > > > > > >>>>> and
> > > > > > > >>>>>>>> either
> > > > > > > >>>>>>>>>>> other
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > > >> as a
> > > > > > > >>>>>> single
> > > > > > > >>>>>>>>>>> "object",
> > > > > > > >>>>>>>>>>>>> but
> > > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > > >>>> "status"
> > > > > > > >>>>>> of
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>>>>> TaskGroup
> > > > > > > >>>>>>>>>>>>>> was
> > > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > > > > >>> object
> > > > > > > >>>>>> with
> > > > > > > >>>>>>>> its
> > > > > > > >>>>>>>>>> own
> > > > > > > >>>>>>>>>>>>>> database
> > > > > > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > > > > > >>>> tasks.
> > > > > > > >>>>> I
> > > > > > > >>>>>>>> think
> > > > > > > >>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>> could
> > > > > > > >>>>>>>>>>>>>>>>> build
> > > > > > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > > > >> point
> > > > > > > >>> of
> > > > > > > >>>>>> view
> > > > > > > >>>>>>> a
> > > > > > > >>>>>>>>> DAG
> > > > > > > >>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > > > >> differently.
> > > > > > > >>> So
> > > > > > > >>>>> it
> > > > > > > >>>>>>>> really
> > > > > > > >>>>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>> becomes
> > > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> sets
> > > > > > > >>> of
> > > > > > > >>>>>> Tasks,
> > > > > > > >>>>>>>> and
> > > > > > > >>>>>>>>>>>> allows
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> UI
> > > > > > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> Chris
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > > > > >> the
> > > > > > > >>>> more
> > > > > > > >>>>>>>>> important
> > > > > > > >>>>>>>>>>>> issue
> > > > > > > >>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>> fix),
> > > > > > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > > > > >>> right
> > > > > > > >>>>> way
> > > > > > > >>>>>>>>> forward
> > > > > > > >>>>>>>>>>>> (just
> > > > > > > >>>>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>>>>> might
> > > > > > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > > > > >>> adding
> > > > > > > >>>>>>> visual
> > > > > > > >>>>>>>>>>> grouping
> > > > > > > >>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> UI).
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > > > > >>> with
> > > > > > > >>>>> more
> > > > > > > >>>>>>>>> context
> > > > > > > >>>>>>>>>>> on
> > > > > > > >>>>>>>>>>>>> why
> > > > > > > >>>>>>>>>>>>>>>>> subdags
> > > > > > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>
> > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > >>>>>>>>>>>>>> . A
> > > > > > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > > > > >> is
> > > > > > > >>>> e.g.
> > > > > > > >>>>>>>>> enabling
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>> operator
> > > > > > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > > > > >>>> well. I
> > > > > > > >>>>>> see
> > > > > > > >>>>>>>>> this
> > > > > > > >>>>>>>>>>>> being
> > > > > > > >>>>>>>>>>>>>>>>> separate
> > > > > > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > > > > >> UI
> > > > > > > >>>> but
> > > > > > > >>>>>> one
> > > > > > > >>>>>>> of
> > > > > > > >>>>>>>>> the
> > > > > > > >>>>>>>>>>> two
> > > > > > > >>>>>>>>>>>>>> items
> > > > > > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > > > >>>>>> functionality.
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > > > > >> and
> > > > > > > >>>>> they
> > > > > > > >>>>>>> are
> > > > > > > >>>>>>>>>>> always a
> > > > > > > >>>>>>>>>>>>>> giant
> > > > > > > >>>>>>>>>>>>>>>>> pain
> > > > > > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > > > > >>>>> confusion
> > > > > > > >>>>>>> and
> > > > > > > >>>>>>>>>>>> breakages
> > > > > > > >>>>>>>>>>>>>>>>> during
> > > > > > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > > > >> Coder <
> > > > > > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > > > > >> UI
> > > > > > > >>>>>>> concept. I
> > > > > > > >>>>>>>>> use
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> subdag
> > > > > > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > > > > >>> you
> > > > > > > >>>>>> have a
> > > > > > > >>>>>>>>> group
> > > > > > > >>>>>>>>>>> of
> > > > > > > >>>>>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>>>>>> that
> > > > > > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > > > >> tasks
> > > > > > > >>>>>> start,
> > > > > > > >>>>>>>>> using
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>> subdag
> > > > > > > >>>>>>>>>>>>>>> is
> > > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > > > > > >>>> and I
> > > > > > > >>>>>>> think
> > > > > > > >>>>>>>>>> also
> > > > > > > >>>>>>>>>>>> make
> > > > > > > >>>>>>>>>>>>>> it
> > > > > > > >>>>>>>>>>>>>>>>>> easier
> > > > > > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > > > >> Hamlin
> > > > > > > >>> <
> > > > > > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > > > >>>>>> Berlin-Taylor
> > > > > > > >>>>>>> <
> > > > > > > >>>>>>>>>>>>>> ash@apache.org
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > > > >>>> anymore?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > > > >>>>> replacing
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> with
> > > > > > > >>>>>>>>>> a
> > > > > > > >>>>>>>>>>> UI
> > > > > > > >>>>>>>>>>>>>>>>> grouping
> > > > > > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > > > >> to
> > > > > > > >>>> get
> > > > > > > >>>>>>>> wrong,
> > > > > > > >>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>> closer
> > > > > > > >>>>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>>>>> what
> > > > > > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > > > >>>> subdags?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > > > >>>> subdags
> > > > > > > >>>>>>> could
> > > > > > > >>>>>>>>>> start
> > > > > > > >>>>>>>>>>>>>> running
> > > > > > > >>>>>>>>>>>>>>> in
> > > > > > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > > > >> we
> > > > > > > >>>> not
> > > > > > > >>>>>>> also
> > > > > > > >>>>>>>>> just
> > > > > > > >>>>>>>>>>>>>>> _enitrely_
> > > > > > > >>>>>>>>>>>>>>>>>>> remove
> > > > > > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > > > >> it
> > > > > > > >>>> with
> > > > > > > >>>>>>>>> something
> > > > > > > >>>>>>>>>>>>>> simpler.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > > > >>> haven't
> > > > > > > >>>>> used
> > > > > > > >>>>>>>> them
> > > > > > > >>>>>>>>>>>>>> extensively
> > > > > > > >>>>>>>>>>>>>>> so
> > > > > > > >>>>>>>>>>>>>>>>>> may
> > > > > > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > > > >>>> has(?)
> > > > > > > >>>>> to
> > > > > > > >>>>>>> be
> > > > > > > >>>>>>>> of
> > > > > > > >>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>> form
> > > > > > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > > > >> schedule_interval,
> > > > > > > >>>> but
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>> has
> > > > > > > >>>>>>>>> to
> > > > > > > >>>>>>>>>>>> match
> > > > > > > >>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>> parent
> > > > > > > >>>>>>>>>>>>>>>>>>>> dag
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > > > > >>>> (Does
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>> make
> > > > > > > >>>>>>>>>>> sense
> > > > > > > >>>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>> do
> > > > > > > >>>>>>>>>>>>>>>>>> this?
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > > > >>> sub
> > > > > > > >>>>> dag
> > > > > > > >>>>>>>> would
> > > > > > > >>>>>>>>>>> never
> > > > > > > >>>>>>>>>>>>>>>>> execute, so
> > > > > > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > > > >>>>> operator a
> > > > > > > >>>>>>>>> subdag
> > > > > > > >>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>>>> always
> > > > > > > >>>>>>>>>>>>>>>>>> a
> > > > > > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > > > >>>>>> Berlin-Taylor <
> > > > > > > >>>>>>>>>>>>>> ash@apache.org>
> > > > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > > > >>>>> excited
> > > > > > > >>>>>> to
> > > > > > > >>>>>>>> see
> > > > > > > >>>>>>>>>> how
> > > > > > > >>>>>>>>>>>>> this
> > > > > > > >>>>>>>>>>>>>>>>>>> progresses.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > > >>> parsing*:
> > > > > > > >>>>> This
> > > > > > > >>>>>>>>>> rewrites
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > > >>> parsing,
> > > > > > > >>>>> and
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> will
> > > > > > > >>>>>>>>>>>> give a
> > > > > > > >>>>>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > > > >>>> already
> > > > > > > >>>>>> does
> > > > > > > >>>>>>>>> this
> > > > > > > >>>>>>>>>> I
> > > > > > > >>>>>>>>>>>>> think.
> > > > > > > >>>>>>>>>>>>>>> At
> > > > > > > >>>>>>>>>>>>>>>>>> least
> > > > > > > >>>>>>>>>>>>>>>>>>>> if
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > > > >>>> correctly.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > > > >>>> Huang <
> > > > > > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > > > >>>> collect
> > > > > > > >>>>>>>>> feedback
> > > > > > > >>>>>>>>>> on
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>> AIP-34
> > > > > > > >>>>>>>>>>>>>>>>>> on
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > > > >>>>>> previously
> > > > > > > >>>>>>>>>> briefly
> > > > > > > >>>>>>>>>>>>>>>>> mentioned in
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > > > >>> done
> > > > > > > >>>>> for
> > > > > > > >>>>>>>>> Airflow
> > > > > > > >>>>>>>>>>> 2.0,
> > > > > > > >>>>>>>>>>>>> and
> > > > > > > >>>>>>>>>>>>>>>>> one of
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > > > >>> attach
> > > > > > > >>>>>> tasks
> > > > > > > >>>>>>>> back
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>> root
> > > > > > > >>>>>>>>>>>>>>>>> DAG.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > > > >>>>>> SubDagOperator
> > > > > > > >>>>>>>>>> related
> > > > > > > >>>>>>>>>>>>>> issues
> > > > > > > >>>>>>>>>>>>>>> by
> > > > > > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > > > >> while
> > > > > > > >>>>>>> respecting
> > > > > > > >>>>>>>>>>>>>> dependencies
> > > > > > > >>>>>>>>>>>>>>>>>> during
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > > > >> effect
> > > > > > > >>>> on
> > > > > > > >>>>>> the
> > > > > > > >>>>>>> UI
> > > > > > > >>>>>>>>>> will
> > > > > > > >>>>>>>>>>> be
> > > > > > > >>>>>>>>>>>>>>>>> achieved
> > > > > > > >>>>>>>>>>>>>>>>>>>> through
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > > > >>>> function
> > > > > > > >>>>>> more
> > > > > > > >>>>>>>>>>> reusable
> > > > > > > >>>>>>>>>>>>>>> because
> > > > > > > >>>>>>>>>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>>>>>>> don't
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > > > >>>>>>> child_dag_name
> > > > > > > >>>>>>>>> in
> > > > > > > >>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>> function
> > > > > > > >>>>>>>>>>>>>>>>>>>>> signature
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > > >>> parsing*:
> > > > > > > >>>>> This
> > > > > > > >>>>>>>>>> rewrites
> > > > > > > >>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > > >>> parsing,
> > > > > > > >>>>> and
> > > > > > > >>>>>> it
> > > > > > > >>>>>>>>> will
> > > > > > > >>>>>>>>>>>> give a
> > > > > > > >>>>>>>>>>>>>>> flat
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > > > >> new
> > > > > > > >>>>>>>>> SubDagOperator
> > > > > > > >>>>>>>>>>>> acts
> > > > > > > >>>>>>>>>>>>>>> like a
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > > > >>>>> methods
> > > > > > > >>>>>>> are
> > > > > > > >>>>>>>>>>> removed.
> > > > > > > >>>>>>>>>>>>> The
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > > > >> *with
> > > > > > > >>>>>>>>> *subdag_args
> > > > > > > >>>>>>>>>>> *and
> > > > > > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > > > >> PythonOperator
> > > > > > > >>>>>>>> signature.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > > > >>>>>>> current_group
> > > > > > > >>>>>>>> &
> > > > > > > >>>>>>>>>>>>>> parent_group
> > > > > > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > > > >>> used
> > > > > > > >>>>> to
> > > > > > > >>>>>>>> group
> > > > > > > >>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>> for
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > > > >>>>> further
> > > > > > > >>>>>>> to
> > > > > > > >>>>>>>>>> group
> > > > > > > >>>>>>>>>>>>>>> arbitrary
> > > > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > > > >>> allow
> > > > > > > >>>>>>>>> group-level
> > > > > > > >>>>>>>>>>>>>> operations
> > > > > > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > > > >>> the
> > > > > > > >>>>>> dag)
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > > > >> Proposed
> > > > > > > >>>> UI
> > > > > > > >>>>>>>>>> modification
> > > > > > > >>>>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>>>> allow
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > > > >>>> flat
> > > > > > > >>>>>>>>> structure
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>>> pair
> > > > > > > >>>>>>>>>>>>>>> with
> > > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > > >>>>>>>>>>>>>>>>>>>>> first
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > > > >>>>> hierarchical
> > > > > > > >>>>>>>>>>> structure.
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > > > >> PRs
> > > > > > > >>>> for
> > > > > > > >>>>>>>> details:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > > > >>>>> aspects
> > > > > > > >>>>>>>> that
> > > > > > > >>>>>>>>>> you
> > > > > > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > > > >>> the
> > > > > > > >>>>>> third
> > > > > > > >>>>>>>>>> change
> > > > > > > >>>>>>>>>>>>>>> regarding
> > > > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > > > >>>> looking
> > > > > > > >>>>>>>> forward
> > > > > > > >>>>>>>>>> to
> > > > > > > >>>>>>>>>>>> it!
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>> --
> > > > > > > >>>>>>>>>>>>>>> Thanks & Regards
> > > > > > > >>>>>>>>>>>>>>> Poornima
> > > > > > > >>>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>>
> > > > > > > >>>>>>>>>>>
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> --
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Jarek Potiuk
> > > > > > > >>>>>>> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > > Engineer
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> <+48660796129
> > > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>>>>>
> > > > > > > >>>>>>
> > > > > > > >>>>>
> > > > > > > >>>>>
> > > > > > > >>>>> --
> > > > > > > >>>>>
> > > > > > > >>>>> Jarek Potiuk
> > > > > > > >>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > > > >>>>>
> > > > > > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>>>
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> --
> > > > > > > >>>>
> > > > > > > >>>> *Jacob Ferriero*
> > > > > > > >>>>
> > > > > > > >>>> Strategic Cloud Engineer: Data Engineering
> > > > > > > >>>>
> > > > > > > >>>> jferriero@google.com
> > > > > > > >>>>
> > > > > > > >>>> 617-714-2509
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Hi, All,
Here's the updated AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>.
The PR has been fine-tuned with better UI interactions and added
serialization of TaskGroup: https://github.com/apache/airflow/pull/10153

Here's some experiment results:
A made up dag containing 403 tasks, and 5696 edges. Grouped like this. Note
there's a inside_section_2 is intentionally made to depend on all tasks
in inside_section_1 to generate a large number of edges. The observation is
that opening the top level graph is very quick, around 270ms. Expanding
groups that don't have a lot of dense dependencies on other groups are also
hardly noticeable. E.g expanding section_1 takes 330ms. The part that takes
time is when expanding both groups inside_section_1 and inside_section_2
Because there are 2500 edges between these two inner groups, it took 63
seconds to expand both of them. Majority of the time (more than 62seconds)
is actually taken by the layout() function in dagre. In other words, it's
very fast to add nodes and edges, but laying them out on the graph takes
time. This issue is not actually a problem specific to TaskGroup. Without
TaskGroup, if a DAG contains too many edges, it takes time to layout the
graph too.

On the other hand, a more realistic experiment with production DAG
containing about 400 tasks and 700 edges showed that grouping tasks into
three levels of nested TaskGroup cut the upfront page opening time from
around 6s to 500ms. (Obviously the time is paid back when user gradually
expands all the groups one by one, but normally people don't need to expand
every group every time so it's still a big saving). The experiments are
done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.

I can see a few possible improvements to TaskGroup (or how it's used) that
can be done as a next-step:
1). Like Gerard suggested, we can implement lazy-loading. Instead of
displaying the whole DAG, we can limit the Graph View to show only a single
TaskGroup, omitting its edges going out to other TaskGroups. This behaviour
is more like SubDagOperator where users can zoom into/out of a TaskGroup
and look at only tasks within that TaskGroup as if those are the only tasks
on the DAG. This can be done with either background javascript calls or by
making a new get request with filtering parameters. Obviously the downside
is that it's not as explicit as showing all the dependencies on the graph.
2). Users can improve the organization of the DAG themselves to reduce the
number of edges. E.g. if every task in group2 depends on every tasks in
group1, instead of doing group1 >> group2, they can add a DummyOperator in
between and do this: group1 >> dummy >> group2. This cuts down the number
of edges significantly and page load becomes much faster.
3). If we really want, we can improve the >> operator of TaskGroup to do 2)
automatically. If it sees that both sides of >> are TaskGroup, it can
create a DummyOperator on behalf of the user. The downside is that it may
be too much magic.

Thanks,
Qian

def create_section():
"""
Create tasks in the outer section.
"""
dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(100)]

with TaskGroup("inside_section_1") as inside_section_1:
_ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]

with TaskGroup("inside_section_2") as inside_section_2:
_ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]

dummies[-1] >> inside_section_1
dummies[-2] >> inside_section_2
inside_section_1 >> inside_section_2


with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
start = DummyOperator(task_id="start")

with TaskGroup("section_1") as section_1:
create_section()

some_other_task = DummyOperator(task_id="some-other-task")

with TaskGroup("section_2") as section_2:
create_section()

end = DummyOperator(task_id='end')

start >> section_1 >> some_other_task >> section_2 >> end


On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
<gc...@twitter.com.invalid> wrote:

> Re graph times. That makes sense. Let me know what you find. We may be able
> to contribute on the lazy loading part.
>
> Looking forward to see the updated AIP!
>
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Permissions granted, let me know if you face any issues.
> >
> > On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:
> >
> > > Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > >
> > > On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > >
> > > > What's your ID i.e. if you haven't created an account yet, please
> > create
> > > > one at https://cwiki.apache.org/confluence/signup.action and send us
> > > your
> > > > ID and we will add permissions.
> > > >
> > > > Thanks. I'll edit the AIP. May I request permission to edit it?
> > > > > My wiki user email is yuqian1990@gmail.com.
> > > >
> > > >
> > > > On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> wrote:
> > > >
> > > > > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to
> > edit
> > > > it?
> > > > > My wiki user email is yuqian1990@gmail.com.
> > > > >
> > > > > Re Gerard: yes the UI loads all the nodes as json from the web
> server
> > > at
> > > > > once. However, it only adds the top level nodes and edges to the
> > graph
> > > > when
> > > > > the Graph View page is first opened. And then adds the expanded
> nodes
> > > to
> > > > > the graph as the user expands them. From what I've experienced with
> > > DAGs
> > > > > containing around 400 tasks (not using TaskGroup or
> SubDagOperator),
> > > > > opening the whole dag in Graph View usually takes 5 seconds. Less
> > than
> > > > 60ms
> > > > > of that is taken by loading the data from webserver. The remaining
> > > 4.9s+
> > > > is
> > > > > taken by javascript functions in dagre-d3.min.js such as
> createNodes,
> > > > > createEdgeLabels, etc and by rendering the graph. With TaskGroup
> > being
> > > > used
> > > > > to group tasks into a smaller number of top-level nodes, the amount
> > of
> > > > data
> > > > > loaded from webserver will remain about the same compared to a flat
> > dag
> > > > of
> > > > > the same size, but the number of nodes and edges needed to be plot
> on
> > > the
> > > > > graph can be reduced significantly. So in theory this should speed
> up
> > > the
> > > > > time it takes to open Graph View even without lazy-loading the data
> > > (I'll
> > > > > experiment to find out). That said, if it comes to a point
> > lazy-loading
> > > > > helps, we can still implement it as an improvement.
> > > > >
> > > > > Re James: the Tree View looks as if all all the groups are fully
> > > > expanded.
> > > > > (because under the hood all the tasks are in a single DAG). I'm
> less
> > > > > worried about Tree View at the moment because it already has a
> > > mechanism
> > > > > for collapsing tasks by the dependency tree. That said, the Tree
> View
> > > can
> > > > > definitely be improved too with TaskGroup. (e.g. collapse tasks in
> > the
> > > > same
> > > > > TaskGroup when Tree View is first opened).
> > > > >
> > > > > For both suggestions, implementing them don't require fundamental
> > > changes
> > > > > to the idea. I think we can have a basic working TaskGroup first,
> and
> > > > then
> > > > > improve it incrementally in several PRs as we get more feedback
> from
> > > the
> > > > > community. What do you think?
> > > > >
> > > > > Qian
> > > > >
> > > > >
> > > > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I agree this looks great, one question, how does the tree view
> > look?
> > > > > >
> > > > > > James Coder
> > > > > >
> > > > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > gcasassaez@twitter.com
> > > > > .invalid>
> > > > > > wrote:
> > > > > > >
> > > > > > > First of all, this is awesome!!
> > > > > > >
> > > > > > > Secondly, checking your UI code, seems you are loading all
> > > operators
> > > > at
> > > > > > > once. Wondering if we can load them as needed (aka load
> whenever
> > we
> > > > > click
> > > > > > > the TaskGroup). Some of our DAGs are so large that take forever
> > to
> > > > load
> > > > > > on
> > > > > > > the Graph view, so worried about this still being an issue
> here.
> > It
> > > > may
> > > > > > be
> > > > > > > easily solvable by implementing lazy loading of the graph. Not
> > sure
> > > > how
> > > > > > > easy to implement/add to the UI extension (and dont want to
> push
> > > for
> > > > > > early
> > > > > > > optimization as its the root of all evil).
> > > > > > > Gerard Casas Saez
> > > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > > >
> > > > > > >
> > > > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > bin.huangxb@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi Yu,
> > > > > > >>
> > > > > > >> Thank you so much for taking on this. I was fairly distracted
> > > > > previously
> > > > > > >> and I didn't have the time to update the proposal. In fact,
> > after
> > > > > > >> discussing with Ash, Kaxil and Daniel, the direction of this
> AIP
> > > has
> > > > > > been
> > > > > > >> changed to favor the concept of TaskGroup instead of rewriting
> > > > > > >> SubDagOperator (though it may may sense to deprecate SubDag
> in a
> > > > > future
> > > > > > >> date.).
> > > > > > >>
> > > > > > >> Your PR is amazing and it has implemented the desire
> features. I
> > > > think
> > > > > > we
> > > > > > >> can focus on your new PR instead. Do you mind updating the AIP
> > > based
> > > > > on
> > > > > > >> what you have done in your PR?
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Bin
> > > > > > >>
> > > > > > >>
> > > > > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> yuqian1990@gmail.com>
> > > > > wrote:
> > > > > > >>>
> > > > > > >>> Hi, all, I've added the basic UI changes to my proposed
> > > > > implementation
> > > > > > of
> > > > > > >>> TaskGroup as UI grouping concept:
> > > > > > >>> https://github.com/apache/airflow/pull/10153
> > > > > > >>>
> > > > > > >>> I think Chris had a pretty good specification of TaskGroup so
> > i'm
> > > > > > quoting
> > > > > > >>> it here. The only thing I don't fully agree with is the
> > > restriction
> > > > > > >>> "... **cannot*
> > > > > > >>> have dependencies between a Task in a TaskGroup and either a*
> > > > > > >>> *   Task in a different TaskGroup or a Task not in any
> > group*". I
> > > > > think
> > > > > > >>> this is over restrictive. Since TaskGroup is a UI concept,
> > tasks
> > > > can
> > > > > > have
> > > > > > >>> dependencies on tasks in other TaskGroup or not in any
> > TaskGroup.
> > > > In
> > > > > my
> > > > > > >> PR,
> > > > > > >>> this is allowed. The graph edges will update accordingly when
> > > > > > TaskGroups
> > > > > > >>> are expanded/collapsed. TaskGroup is only helping to make the
> > UI
> > > > look
> > > > > > >> less
> > > > > > >>> crowded. Under the hood, everything is still a DAG of tasks
> and
> > > > edges
> > > > > > so
> > > > > > >>> things work normally. Here's a screenshot
> > > > > > >>> <
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > > >>>>
> > > > > > >>> of the UI interaction.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > > > dependencies
> > > > > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> > > > dependencies
> > > > > > >>> between a Task in a TaskGroup and either a   Task in a
> > different
> > > > > > >> TaskGroup
> > > > > > >>> or a Task not in any group   - You *can* have dependencies
> > > between
> > > > a
> > > > > > >>> TaskGroup and either other   TaskGroups or Tasks not in any
> > group
> > > >  -
> > > > > > The
> > > > > > >>> UI will by default render a TaskGroup as a single "object",
> but
> > > > >  which
> > > > > > >> you
> > > > > > >>> expand or zoom into in some way   - You'd need some way to
> > > > determine
> > > > > > what
> > > > > > >>> the "status" of a TaskGroup was   at least for UI display
> > > purposes*
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Regarding Jake's comment, I agree it's possible to implement
> > the
> > > > > > >> "retrying
> > > > > > >>> tasks in a group" pattern he mentioned as an optional feature
> > of
> > > > > > >> TaskGroup
> > > > > > >>> although that may go against having TaskGroup as a pure UI
> > > concept.
> > > > > For
> > > > > > >> the
> > > > > > >>> motivating example Jake provided, I suggest implementing both
> > > > > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> > > > > operator.
> > > > > > It
> > > > > > >>> can do something like BaseSensorOperator.execute() does in
> > > > > "reschedule"
> > > > > > >>> mode, i.e. it first executes some code to submit the long
> > running
> > > > job
> > > > > > to
> > > > > > >>> the external service, and store the state (e.g. in XCom).
> Then
> > > > > > reschedule
> > > > > > >>> itself. Subsequent runs then pokes for the completion state.
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > > >> <jferriero@google.com.invalid
> > > > > > >>>>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> I really like this idea of a TaskGroup container as I think
> > this
> > > > > will
> > > > > > >> be
> > > > > > >>>> much easier to use than SubDag.
> > > > > > >>>>
> > > > > > >>>> I'd like to propose an optional behavior for special retry
> > > > mechanics
> > > > > > >> via
> > > > > > >>> a
> > > > > > >>>> TaskGroup.retry_all property.
> > > > > > >>>> This way I could use TaskGroup to replace my favorite use of
> > > > SubDag
> > > > > > for
> > > > > > >>>> atomically retrying tasks of the pattern "act on external
> > state
> > > > then
> > > > > > >>>> reschedule poll until desired state reached".
> > > > > > >>>>
> > > > > > >>>> Motivating use case I have for a SubDag is very simple two
> > task
> > > > > group
> > > > > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > > >>>> I use SubDag is because it gives me an easy way to retry the
> > > > > > >>> SubmitJobTask
> > > > > > >>>> if something about the PollJobSensor fails.
> > > > > > >>>> This pattern would be really nice for jobs that are expected
> > to
> > > > run
> > > > > a
> > > > > > >>> long
> > > > > > >>>> time (because we can use sensor can use reschedule mode
> > freeing
> > > up
> > > > > > >> slots)
> > > > > > >>>> but might fail for a retryable reason.
> > > > > > >>>> However, using SubDag to meet this use case defeats the
> > purpose
> > > > > > because
> > > > > > >>>> SubDag infamously
> > > > > > >>>> <
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > > >>>>>
> > > > > > >>>> blocks a "controller" slot for the entire duration.
> > > > > > >>>> This may feel like a cyclic behavior but reality it is very
> > > common
> > > > > for
> > > > > > >> a
> > > > > > >>>> single operator to submit job / wait til done.
> > > > > > >>>> We could use this case refactor many operators (e.g. BQ,
> > > Dataproc,
> > > > > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > PollTask]
> > > > > with
> > > > > > >> an
> > > > > > >>>> optional reschedule mode if user knows that this job may
> take
> > a
> > > > long
> > > > > > >>> time.
> > > > > > >>>>
> > > > > > >>>> I'd be happy to the development work on adding this specific
> > > retry
> > > > > > >>> behavior
> > > > > > >>>> to TaskGroup once the base concept is implemented if others
> in
> > > the
> > > > > > >>>> community would find this a useful feature.
> > > > > > >>>>
> > > > > > >>>> Cheers,
> > > > > > >>>> Jake
> > > > > > >>>>
> > > > > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > > Jarek.Potiuk@polidea.com
> > > > > > >>>
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>>> All for it :) . I think we are getting closer to have
> regular
> > > > > > >> planning
> > > > > > >>>> and
> > > > > > >>>>> making some structured approach to 2.0 and starting task
> > force
> > > > for
> > > > > it
> > > > > > >>>> soon,
> > > > > > >>>>> so I think this should be perfectly fine to discuss and
> even
> > > > start
> > > > > > >>>>> implementing what's beyond as soon as we make sure that we
> > are
> > > > > > >>>> prioritizing
> > > > > > >>>>> 2.0 work.
> > > > > > >>>>>
> > > > > > >>>>> J,
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > yuqian1990@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Hi Jarek,
> > > > > > >>>>>>
> > > > > > >>>>>> I agree we should not change the behaviour of the existing
> > > > > > >>>> SubDagOperator
> > > > > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > about
> > > > > > >>> TaskGroup
> > > > > > >>>>> as
> > > > > > >>>>>> a brand new concept/feature independent from the existing
> > > > > > >>>> SubDagOperator?
> > > > > > >>>>>> In other words, shall we add TaskGroup as a UI grouping
> > > concept
> > > > > > >> like
> > > > > > >>>> Ash
> > > > > > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever
> we
> > > are
> > > > > > >>> ready
> > > > > > >>>>> with
> > > > > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> 2.1.
> > > > > > >>>>>>
> > > > > > >>>>>> I really like Ash's idea of simplifying the SubDagOperator
> > > idea
> > > > > > >> into
> > > > > > >>> a
> > > > > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > "reattaching
> > > > > > >> all
> > > > > > >>>> the
> > > > > > >>>>>> tasks to the root DAG" is the way to go. And I see James
> > > pointed
> > > > > > >> out
> > > > > > >>> we
> > > > > > >>>>>> need some helper functions to simplify dependencies
> setting
> > of
> > > > > > >>>> TaskGroup.
> > > > > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > having
> > > > > > >>>> TaskGroup
> > > > > > >>>>> as
> > > > > > >>>>>> a UI concept should be a relatively small change. We can
> > > > simplify
> > > > > > >>>>> Xinbin's
> > > > > > >>>>>> PR further. So I put up this alternative proposal here:
> > > > > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > > > > >>>>>>
> > > > > > >>>>>> I have not done any UI changes due to lack of experience
> > with
> > > > web
> > > > > > >> UI.
> > > > > > >>>> If
> > > > > > >>>>>> anyone's interested, please take a look at the PR.
> > > > > > >>>>>>
> > > > > > >>>>>> Qian
> > > > > > >>>>>>
> > > > > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > > >>> Jarek.Potiuk@polidea.com
> > > > > > >>>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> Similar point here to the other ideas that are popping
> up.
> > > > Maybe
> > > > > > >> we
> > > > > > >>>>>> should
> > > > > > >>>>>>> just focus on completing 2.0 and make all discussions
> about
> > > > > > >> further
> > > > > > >>>>>>> improvements to 2.1? While those are important
> discussions
> > > (and
> > > > > > >> we
> > > > > > >>>>> should
> > > > > > >>>>>>> continue them in the  near future !) I think at this
> point
> > > > > > >> focusing
> > > > > > >>>> on
> > > > > > >>>>>>> delivering 2.0 in its current shape should be our focus
> > now ?
> > > > > > >>>>>>>
> > > > > > >>>>>>> J.
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > > >>> bin.huangxb@gmail.com>
> > > > > > >>>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Hi Daniel
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> I agree that the TaskGroup should have the same API as a
> > DAG
> > > > > > >>> object
> > > > > > >>>>>>> related
> > > > > > >>>>>>>> to task dependencies, but it will not have anything
> > related
> > > to
> > > > > > >>>> actual
> > > > > > >>>>>>>> execution or scheduling.
> > > > > > >>>>>>>> I will update the AIP according to this over the
> weekend.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > > > > > >> import
> > > > > > >>>> the
> > > > > > >>>>>>> object
> > > > > > >>>>>>>> you can import it with parameters to determine the shape
> > of
> > > > the
> > > > > > >>>> DAG.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > > similar
> > > > > > >>>> purpose
> > > > > > >>>>>> as
> > > > > > >>>>>>> a
> > > > > > >>>>>>>> DAG factory function?
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > > >>>>>>> daniel.imberman@gmail.com
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Hi Bin,
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object
> > > (e.g.
> > > > > > >>> the
> > > > > > >>>>>>> bitwise
> > > > > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > > > > >>>> “DAGTemplate”
> > > > > > >>>>>>>> object
> > > > > > >>>>>>>>> s.t. when you import the object you can import it with
> > > > > > >>> parameters
> > > > > > >>>>> to
> > > > > > >>>>>>>>> determine the shape of the DAG.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > > >>>>> bin.huangxb@gmail.com
> > > > > > >>>>>>>
> > > > > > >>>>>>>>> wrote:
> > > > > > >>>>>>>>> The TaskGroup will not take schedule interval as a
> > > parameter
> > > > > > >>>>> itself,
> > > > > > >>>>>>> and
> > > > > > >>>>>>>> it
> > > > > > >>>>>>>>> depends on the DAG where it attaches to. In my opinion,
> > the
> > > > > > >>>>> TaskGroup
> > > > > > >>>>>>>> will
> > > > > > >>>>>>>>> only contain a group of tasks with interdependencies,
> and
> > > the
> > > > > > >>>>>> TaskGroup
> > > > > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > > > > >>> execution/scheduling
> > > > > > >>>>>> logic
> > > > > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > etc.)
> > > > > > >>> like
> > > > > > >>>> a
> > > > > > >>>>>> DAG
> > > > > > >>>>>>>>> does.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > > > > >> interval
> > > > > > >>>> of
> > > > > > >>>>>> DAG
> > > > > > >>>>>>> is
> > > > > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> min.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> I am curious why you ask this. Is this a use case that
> > you
> > > > > > >> want
> > > > > > >>>> to
> > > > > > >>>>>>>> achieve?
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Bin
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > > >> thanosxnicholas@gmail.com
> > > > > > >>>>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>> Hi Bin,
> > > > > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup
> > the
> > > > > > >>> same
> > > > > > >>>>> as
> > > > > > >>>>>>> the
> > > > > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > > > > >> interval
> > > > > > >>> of
> > > > > > >>>>>>>> TaskGroup
> > > > > > >>>>>>>>>> could be different with that of the DAG? For example,
> > > there
> > > > > > >>> is
> > > > > > >>>>> the
> > > > > > >>>>>>>>> scenario
> > > > > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > > > > >> schedule
> > > > > > >>>>>> interval
> > > > > > >>>>>>>> of
> > > > > > >>>>>>>>>> TaskGroup is 20 min.
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> Cheers,
> > > > > > >>>>>>>>>> Nicholas
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > > >>>>>> bin.huangxb@gmail.com
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>>> Hi Nicholas,
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> I am not sure about the old behavior of
> SubDagOperator,
> > > > > > >>> maybe
> > > > > > >>>>> it
> > > > > > >>>>>>> will
> > > > > > >>>>>>>>>> throw
> > > > > > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > > > > > >>>>>>>> schedule_interval
> > > > > > >>>>>>>>>> will
> > > > > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> replace
> > > > > > >>>> SubDag,
> > > > > > >>>>>>> there
> > > > > > >>>>>>>>>> will
> > > > > > >>>>>>>>>>> be no subdag schedule_interval.
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> Bin
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > > >>>> thanosxnicholas@gmail.com
> > > > > > >>>>>>
> > > > > > >>>>>>>> wrote:
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>>> Hi Bin,
> > > > > > >>>>>>>>>>>> Thanks for your good proposal. I was confused
> whether
> > > > > > >> the
> > > > > > >>>>>>> schedule
> > > > > > >>>>>>>>>>>> interval of SubDAG is different from that of the
> > parent
> > > > > > >>>> DAG?
> > > > > > >>>>> I
> > > > > > >>>>>>> have
> > > > > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > interval
> > > > > > >>> of
> > > > > > >>>>>>> SubDAG.
> > > > > > >>>>>>>> If
> > > > > > >>>>>>>>>> the
> > > > > > >>>>>>>>>>>> SubDagOperator has a different schedule interval,
> what
> > > > > > >>> will
> > > > > > >>>>>>> happen
> > > > > > >>>>>>>>> for
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>> Regards,
> > > > > > >>>>>>>>>>>> Nicholas Jiang
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > > >>>>>>>> bin.huangxb@gmail.com>
> > > > > > >>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> I have rethought about the concept of subdag and
> task
> > > > > > >>>>>> groups. I
> > > > > > >>>>>>>>> think
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > > > > >>> subdag
> > > > > > >>>>> and
> > > > > > >>>>>>>>>> introduce
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > > > > > >>> along
> > > > > > >>>>>> with
> > > > > > >>>>>>>>> their
> > > > > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic
> as a
> > > > > > >>>> DAG*.
> > > > > > >>>>>> The
> > > > > > >>>>>>>>> only
> > > > > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > > > > > >>> still
> > > > > > >>>>> need
> > > > > > >>>>>>> to
> > > > > > >>>>>>>>> add
> > > > > > >>>>>>>>>> it
> > > > > > >>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>> a DAG for execution.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Here is a small code snippet.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> ```
> > > > > > >>>>>>>>>>>>> class TaskGroup:
> > > > > > >>>>>>>>>>>>> """
> > > > > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> If default_args is missing, it will take default
> args
> > > > > > >>>> from
> > > > > > >>>>>> the
> > > > > > >>>>>>>>>> DAG.
> > > > > > >>>>>>>>>>>>> """
> > > > > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > > >>>>>>>>>>>>> pass
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> """
> > > > > > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > > > > > >>> tasks
> > > > > > >>>>> to
> > > > > > >>>>>> a
> > > > > > >>>>>>>> DAG
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> This can be declared in a separate file from the
> dag
> > > > > > >>> file
> > > > > > >>>>>>>>>>>>> """
> > > > > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > > > >>>>>>>>>>>> default_args=default_args)
> > > > > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > > > > >>>>>>>>>>>>> task2.dag = download_group
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> with download_group:
> > > > > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> [task, task2] >> task3
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > > >>>>>>> default_args=default_args,
> > > > > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > > >>>>>>>>>>>>> start >> download_group
> > > > > > >>>>>>>>>>>>> # this is equivalent to
> > > > > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > > >>>>>>>>>>>>> ```
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > > > > > >> set
> > > > > > >>>>>>>> dependencies
> > > > > > >>>>>>>>>>>> between
> > > > > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > > > >>>>>> SubDagOperator,
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>>>> we
> > > > > > >>>>>>>>>>>> can
> > > > > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> task`.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > > > >> Airflow
> > > > > > >>>> 2.0
> > > > > > >>>>>> and
> > > > > > >>>>>>>>> allow
> > > > > > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > > > > > >> want
> > > > > > >>>> to
> > > > > > >>>>>> keep
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Any thoughts?
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> Cheers,
> > > > > > >>>>>>>>>>>>> Bin
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > > > > >>> groups
> > > > > > >>>>> and
> > > > > > >>>>>> a
> > > > > > >>>>>>>>>>>> zoom-in/out
> > > > > > >>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> DAG
> > > > > > >>>>> object
> > > > > > >>>>>>>> since
> > > > > > >>>>>>>>> it
> > > > > > >>>>>>>>>>> is
> > > > > > >>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > > > > >>> create
> > > > > > >>>>>>>> underlying
> > > > > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > > > > >> group
> > > > > > >>>> of
> > > > > > >>>>>>> tasks.
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> Max
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > > >>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> Thank you for your email.
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > > > > >>>>>> rewrites
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > > > >> it
> > > > > > >>>>> will
> > > > > > >>>>>>>> give a
> > > > > > >>>>>>>>>>>> flat
> > > > > > >>>>>>>>>>>>>>>>>> structure at
> > > > > > >>>>>>>>>>>>>>>>>> the task level
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > > >> does
> > > > > > >>>>> this I
> > > > > > >>>>>>>>> think.
> > > > > > >>>>>>>>>> At
> > > > > > >>>>>>>>>>>>> least
> > > > > > >>>>>>>>>>>>>>> if
> > > > > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > > >>> representation,
> > > > > > >>>>> but
> > > > > > >>>>>> at
> > > > > > >>>>>>>>> least
> > > > > > >>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > > > > >> In
> > > > > > >>> my
> > > > > > >>>>>>>> proposal
> > > > > > >>>>>>>>> as
> > > > > > >>>>>>>>>>>> also
> > > > > > >>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > > > >> from
> > > > > > >>>> the
> > > > > > >>>>>>> subdag
> > > > > > >>>>>>>>> and
> > > > > > >>>>>>>>>>> add
> > > > > > >>>>>>>>>>>>>> them
> > > > > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > > > > > >>>> will
> > > > > > >>>>>> look
> > > > > > >>>>>>>>>> exactly
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > > >> attached
> > > > > > >>>> to
> > > > > > >>>>>>> those
> > > > > > >>>>>>>>>>>> sections.
> > > > > > >>>>>>>>>>>>>>> These
> > > > > > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > > > > > >>> UI.
> > > > > > >>>>> So
> > > > > > >>>>>>>> after
> > > > > > >>>>>>>>>>>> parsing
> > > > > > >>>>>>>>>>>>> (
> > > > > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > > > > >> the
> > > > > > >>>>>>> *root_dag
> > > > > > >>>>>>>>>>>> *instead
> > > > > > >>>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>>> *root_dag +
> > > > > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > > >>>>>>>>>> current_group=section-1,
> > > > > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > > > >>> naming
> > > > > > >>>>>>>>>>> suggestions),
> > > > > > >>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > > > >>> nested
> > > > > > >>>>>> group
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>>>>>> still
> > > > > > >>>>>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > > > > > >>>> like
> > > > > > >>>>>> this
> > > > > > >>>>>>>> by
> > > > > > >>>>>>>>>>>>> utilizing
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > > > > >> in
> > > > > > >>>> some
> > > > > > >>>>>>> way.
> > > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > > >>> complexity
> > > > > > >>>> of
> > > > > > >>>>>>>> SubDag
> > > > > > >>>>>>>>>> for
> > > > > > >>>>>>>>>>>>>>> execution
> > > > > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > > >> using
> > > > > > >>>>>> SubDag.
> > > > > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > > > > >>>>> reusable
> > > > > > >>>>>>> dag
> > > > > > >>>>>>>>> code
> > > > > > >>>>>>>>>>> and
> > > > > > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > > > > > >>> new
> > > > > > >>>>>>>>>>> SubDagOperator
> > > > > > >>>>>>>>>>>>> (see
> > > > > > >>>>>>>>>>>>>>> AIP
> > > > > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > > > > >>>>> function
> > > > > > >>>>>>> for
> > > > > > >>>>>>>>>>>>> generating 1
> > > > > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > > > > >>> (in
> > > > > > >>>>> this
> > > > > > >>>>>>>> case,
> > > > > > >>>>>>>>>> it
> > > > > > >>>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>>> just
> > > > > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > > > > >>> root
> > > > > > >>>>>> dag).
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > > > > >>>> with a
> > > > > > >>>>>>>>>> simpler
> > > > > > >>>>>>>>>>>>>> concept
> > > > > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > > > >> out
> > > > > > >>>> the
> > > > > > >>>>>>>>>> contents
> > > > > > >>>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>> SubDag
> > > > > > >>>>>>>>>>>>>>>> and becomes more like
> > > > > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > >>>>>>>>>>>>>>> (forgive
> > > > > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > > > > > >>>> still
> > > > > > >>>>>>>>>>> necessary
> > > > > > >>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>> keep the
> > > > > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > > > > >>>> name?
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > > > > >>>> Chris
> > > > > > >>>>>>> Palmer
> > > > > > >>>>>>>>> for
> > > > > > >>>>>>>>>>>>> helping
> > > > > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > > > > > >>>> will
> > > > > > >>>>>> just
> > > > > > >>>>>>>>> paste
> > > > > > >>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>> here.
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > > > >> in
> > > > > > >>>> the
> > > > > > >>>>>> same
> > > > > > >>>>>>>>>>>> TaskGroup,
> > > > > > >>>>>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > > > > >> a
> > > > > > >>>>>>> TaskGroup
> > > > > > >>>>>>>>>> and
> > > > > > >>>>>>>>>>>>>> either a
> > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > > > >> in
> > > > > > >>>> any
> > > > > > >>>>>>> group
> > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > >>> TaskGroup
> > > > > > >>>>> and
> > > > > > >>>>>>>>>> either
> > > > > > >>>>>>>>>>>>> other
> > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > >> as
> > > > > > >>> a
> > > > > > >>>>>> single
> > > > > > >>>>>>>>>>>> "object",
> > > > > > >>>>>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > >>>>> "status"
> > > > > > >>>>>>> of a
> > > > > > >>>>>>>>>>>>> TaskGroup
> > > > > > >>>>>>>>>>>>>>> was
> > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > > > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > > >>> executor), I
> > > > > > >>>>>> think
> > > > > > >>>>>>>>>>> TaskGroup
> > > > > > >>>>>>>>>>>>>>> should
> > > > > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > > > > >> to
> > > > > > >>>>>>> implement
> > > > > > >>>>>>>>>> some
> > > > > > >>>>>>>>>>>>>> metadata
> > > > > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > > > >>> tasks
> > > > > > >>>>>> etc.)
> > > > > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > > > > > >>> up
> > > > > > >>>>> the
> > > > > > >>>>>>>>>> individual
> > > > > > >>>>>>>>>>>>>> tasks'
> > > > > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > > >> status
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> Bin
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > > >> Imberman
> > > > > > >>> <
> > > > > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > > > > > >>> to
> > > > > > >>>>> tie
> > > > > > >>>>>>> dags
> > > > > > >>>>>>>>>>>> together
> > > > > > >>>>>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>> I
> > > > > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > > > > > >>>> could
> > > > > > >>>>>>>>>> essentially
> > > > > > >>>>>>>>>>>>> write
> > > > > > >>>>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > > >>>> starter-tasks
> > > > > > >>>>>> for
> > > > > > >>>>>>>>> that
> > > > > > >>>>>>>>>>> DAG.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > > > > >> UI
> > > > > > >>>>>> concept.
> > > > > > >>>>>>>> It
> > > > > > >>>>>>>>>>>> doesn’t
> > > > > > >>>>>>>>>>>>>> need
> > > > > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > > > > > >>>> tasks
> > > > > > >>>>>> to
> > > > > > >>>>>>>> the
> > > > > > >>>>>>>>>>> queue
> > > > > > >>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>> will
> > > > > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > > >> available.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > >>>>>>>>>>>>>>>>> ]
> > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > > > > >> <
> > > > > > >>>>>>>>>>> chris@crpalmer.com
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > > > >>>>>> abstraction.
> > > > > > >>>>>>> I
> > > > > > >>>>>>>>>> think
> > > > > > >>>>>>>>>>>> what
> > > > > > >>>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > > > >> high
> > > > > > >>>>> level
> > > > > > >>>>>> I
> > > > > > >>>>>>>>> think
> > > > > > >>>>>>>>>>> you
> > > > > > >>>>>>>>>>>>> want
> > > > > > >>>>>>>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>>>>> functionality:
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > > > > > >>> the
> > > > > > >>>>>> same
> > > > > > >>>>>>>>>>> TaskGroup,
> > > > > > >>>>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > > > > > >>>>>> TaskGroup
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>>>>>> either
> > > > > > >>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > > > > > >>> any
> > > > > > >>>>>> group
> > > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > > >>> TaskGroup
> > > > > > >>>>> and
> > > > > > >>>>>>>> either
> > > > > > >>>>>>>>>>> other
> > > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > > >> as a
> > > > > > >>>>>> single
> > > > > > >>>>>>>>>>> "object",
> > > > > > >>>>>>>>>>>>> but
> > > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > > >>>> "status"
> > > > > > >>>>>> of
> > > > > > >>>>>>> a
> > > > > > >>>>>>>>>>>> TaskGroup
> > > > > > >>>>>>>>>>>>>> was
> > > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > > > >>> object
> > > > > > >>>>>> with
> > > > > > >>>>>>>> its
> > > > > > >>>>>>>>>> own
> > > > > > >>>>>>>>>>>>>> database
> > > > > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > > > > >>>> tasks.
> > > > > > >>>>> I
> > > > > > >>>>>>>> think
> > > > > > >>>>>>>>>> you
> > > > > > >>>>>>>>>>>>> could
> > > > > > >>>>>>>>>>>>>>>>> build
> > > > > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > > >> point
> > > > > > >>> of
> > > > > > >>>>>> view
> > > > > > >>>>>>> a
> > > > > > >>>>>>>>> DAG
> > > > > > >>>>>>>>>>> with
> > > > > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > > >> differently.
> > > > > > >>> So
> > > > > > >>>>> it
> > > > > > >>>>>>>> really
> > > > > > >>>>>>>>>>> just
> > > > > > >>>>>>>>>>>>>>> becomes
> > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > > > > > >>> of
> > > > > > >>>>>> Tasks,
> > > > > > >>>>>>>> and
> > > > > > >>>>>>>>>>>> allows
> > > > > > >>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> UI
> > > > > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> Chris
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > > > >> the
> > > > > > >>>> more
> > > > > > >>>>>>>>> important
> > > > > > >>>>>>>>>>>> issue
> > > > > > >>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>> fix),
> > > > > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > > > >>> right
> > > > > > >>>>> way
> > > > > > >>>>>>>>> forward
> > > > > > >>>>>>>>>>>> (just
> > > > > > >>>>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>>>>> might
> > > > > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > > > >>> adding
> > > > > > >>>>>>> visual
> > > > > > >>>>>>>>>>> grouping
> > > > > > >>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>> UI).
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > > > >>> with
> > > > > > >>>>> more
> > > > > > >>>>>>>>> context
> > > > > > >>>>>>>>>>> on
> > > > > > >>>>>>>>>>>>> why
> > > > > > >>>>>>>>>>>>>>>>> subdags
> > > > > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>
> > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > >>>>>>>>>>>>>> . A
> > > > > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > > > >> is
> > > > > > >>>> e.g.
> > > > > > >>>>>>>>> enabling
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>> operator
> > > > > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > > > >>>> well. I
> > > > > > >>>>>> see
> > > > > > >>>>>>>>> this
> > > > > > >>>>>>>>>>>> being
> > > > > > >>>>>>>>>>>>>>>>> separate
> > > > > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > > > >> UI
> > > > > > >>>> but
> > > > > > >>>>>> one
> > > > > > >>>>>>> of
> > > > > > >>>>>>>>> the
> > > > > > >>>>>>>>>>> two
> > > > > > >>>>>>>>>>>>>> items
> > > > > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > > >>>>>> functionality.
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > > > >> and
> > > > > > >>>>> they
> > > > > > >>>>>>> are
> > > > > > >>>>>>>>>>> always a
> > > > > > >>>>>>>>>>>>>> giant
> > > > > > >>>>>>>>>>>>>>>>> pain
> > > > > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > > > >>>>> confusion
> > > > > > >>>>>>> and
> > > > > > >>>>>>>>>>>> breakages
> > > > > > >>>>>>>>>>>>>>>>> during
> > > > > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > > >> Coder <
> > > > > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > > > >> UI
> > > > > > >>>>>>> concept. I
> > > > > > >>>>>>>>> use
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> subdag
> > > > > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > > > >>> you
> > > > > > >>>>>> have a
> > > > > > >>>>>>>>> group
> > > > > > >>>>>>>>>>> of
> > > > > > >>>>>>>>>>>>>> tasks
> > > > > > >>>>>>>>>>>>>>>>> that
> > > > > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > > >> tasks
> > > > > > >>>>>> start,
> > > > > > >>>>>>>>> using
> > > > > > >>>>>>>>>> a
> > > > > > >>>>>>>>>>>>> subdag
> > > > > > >>>>>>>>>>>>>>> is
> > > > > > >>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > > > > >>>> and I
> > > > > > >>>>>>> think
> > > > > > >>>>>>>>>> also
> > > > > > >>>>>>>>>>>> make
> > > > > > >>>>>>>>>>>>>> it
> > > > > > >>>>>>>>>>>>>>>>>> easier
> > > > > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > > >> Hamlin
> > > > > > >>> <
> > > > > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > > >>>>>> Berlin-Taylor
> > > > > > >>>>>>> <
> > > > > > >>>>>>>>>>>>>> ash@apache.org
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > > >>>> anymore?
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > > >>>>> replacing
> > > > > > >>>>>> it
> > > > > > >>>>>>>>> with
> > > > > > >>>>>>>>>> a
> > > > > > >>>>>>>>>>> UI
> > > > > > >>>>>>>>>>>>>>>>> grouping
> > > > > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > > >> to
> > > > > > >>>> get
> > > > > > >>>>>>>> wrong,
> > > > > > >>>>>>>>>> and
> > > > > > >>>>>>>>>>>>> closer
> > > > > > >>>>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>>>>> what
> > > > > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > > >>>> subdags?
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > > >>>> subdags
> > > > > > >>>>>>> could
> > > > > > >>>>>>>>>> start
> > > > > > >>>>>>>>>>>>>> running
> > > > > > >>>>>>>>>>>>>>> in
> > > > > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > > >> we
> > > > > > >>>> not
> > > > > > >>>>>>> also
> > > > > > >>>>>>>>> just
> > > > > > >>>>>>>>>>>>>>> _enitrely_
> > > > > > >>>>>>>>>>>>>>>>>>> remove
> > > > > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > > >> it
> > > > > > >>>> with
> > > > > > >>>>>>>>> something
> > > > > > >>>>>>>>>>>>>> simpler.
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > > >>> haven't
> > > > > > >>>>> used
> > > > > > >>>>>>>> them
> > > > > > >>>>>>>>>>>>>> extensively
> > > > > > >>>>>>>>>>>>>>> so
> > > > > > >>>>>>>>>>>>>>>>>> may
> > > > > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > > >>>> has(?)
> > > > > > >>>>> to
> > > > > > >>>>>>> be
> > > > > > >>>>>>>> of
> > > > > > >>>>>>>>>> the
> > > > > > >>>>>>>>>>>>> form
> > > > > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > > >> schedule_interval,
> > > > > > >>>> but
> > > > > > >>>>>> it
> > > > > > >>>>>>>> has
> > > > > > >>>>>>>>> to
> > > > > > >>>>>>>>>>>> match
> > > > > > >>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>> parent
> > > > > > >>>>>>>>>>>>>>>>>>>> dag
> > > > > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > > > >>>> (Does
> > > > > > >>>>>> it
> > > > > > >>>>>>>> make
> > > > > > >>>>>>>>>>> sense
> > > > > > >>>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>> do
> > > > > > >>>>>>>>>>>>>>>>>> this?
> > > > > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > > >>> sub
> > > > > > >>>>> dag
> > > > > > >>>>>>>> would
> > > > > > >>>>>>>>>>> never
> > > > > > >>>>>>>>>>>>>>>>> execute, so
> > > > > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > > >>>>> operator a
> > > > > > >>>>>>>>> subdag
> > > > > > >>>>>>>>>>> with
> > > > > > >>>>>>>>>>>>> --
> > > > > > >>>>>>>>>>>>>>>>> always
> > > > > > >>>>>>>>>>>>>>>>>> a
> > > > > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > > >>>>>> Berlin-Taylor <
> > > > > > >>>>>>>>>>>>>> ash@apache.org>
> > > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > > >>>>> excited
> > > > > > >>>>>> to
> > > > > > >>>>>>>> see
> > > > > > >>>>>>>>>> how
> > > > > > >>>>>>>>>>>>> this
> > > > > > >>>>>>>>>>>>>>>>>>> progresses.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > >>> parsing*:
> > > > > > >>>>> This
> > > > > > >>>>>>>>>> rewrites
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > >>> parsing,
> > > > > > >>>>> and
> > > > > > >>>>>> it
> > > > > > >>>>>>>>> will
> > > > > > >>>>>>>>>>>> give a
> > > > > > >>>>>>>>>>>>>>> flat
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > > >>>> already
> > > > > > >>>>>> does
> > > > > > >>>>>>>>> this
> > > > > > >>>>>>>>>> I
> > > > > > >>>>>>>>>>>>> think.
> > > > > > >>>>>>>>>>>>>>> At
> > > > > > >>>>>>>>>>>>>>>>>> least
> > > > > > >>>>>>>>>>>>>>>>>>>> if
> > > > > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > > >>>> correctly.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > > >>>> Huang <
> > > > > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > > >>>> collect
> > > > > > >>>>>>>>> feedback
> > > > > > >>>>>>>>>> on
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>> AIP-34
> > > > > > >>>>>>>>>>>>>>>>>> on
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > > >>>>>> previously
> > > > > > >>>>>>>>>> briefly
> > > > > > >>>>>>>>>>>>>>>>> mentioned in
> > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > > >>> done
> > > > > > >>>>> for
> > > > > > >>>>>>>>> Airflow
> > > > > > >>>>>>>>>>> 2.0,
> > > > > > >>>>>>>>>>>>> and
> > > > > > >>>>>>>>>>>>>>>>> one of
> > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > > >>> attach
> > > > > > >>>>>> tasks
> > > > > > >>>>>>>> back
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>> root
> > > > > > >>>>>>>>>>>>>>>>> DAG.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > > >>>>>> SubDagOperator
> > > > > > >>>>>>>>>> related
> > > > > > >>>>>>>>>>>>>> issues
> > > > > > >>>>>>>>>>>>>>> by
> > > > > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > > >> while
> > > > > > >>>>>>> respecting
> > > > > > >>>>>>>>>>>>>> dependencies
> > > > > > >>>>>>>>>>>>>>>>>> during
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > > >> effect
> > > > > > >>>> on
> > > > > > >>>>>> the
> > > > > > >>>>>>> UI
> > > > > > >>>>>>>>>> will
> > > > > > >>>>>>>>>>> be
> > > > > > >>>>>>>>>>>>>>>>> achieved
> > > > > > >>>>>>>>>>>>>>>>>>>> through
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > > >>>> function
> > > > > > >>>>>> more
> > > > > > >>>>>>>>>>> reusable
> > > > > > >>>>>>>>>>>>>>> because
> > > > > > >>>>>>>>>>>>>>>>> you
> > > > > > >>>>>>>>>>>>>>>>>>>> don't
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > > >>>>>>> child_dag_name
> > > > > > >>>>>>>>> in
> > > > > > >>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>> function
> > > > > > >>>>>>>>>>>>>>>>>>>>> signature
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > > >>> parsing*:
> > > > > > >>>>> This
> > > > > > >>>>>>>>>> rewrites
> > > > > > >>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > > >>> parsing,
> > > > > > >>>>> and
> > > > > > >>>>>> it
> > > > > > >>>>>>>>> will
> > > > > > >>>>>>>>>>>> give a
> > > > > > >>>>>>>>>>>>>>> flat
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > > >> new
> > > > > > >>>>>>>>> SubDagOperator
> > > > > > >>>>>>>>>>>> acts
> > > > > > >>>>>>>>>>>>>>> like a
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > > >>>>> methods
> > > > > > >>>>>>> are
> > > > > > >>>>>>>>>>> removed.
> > > > > > >>>>>>>>>>>>> The
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > > >> *with
> > > > > > >>>>>>>>> *subdag_args
> > > > > > >>>>>>>>>>> *and
> > > > > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > > >> PythonOperator
> > > > > > >>>>>>>> signature.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > > >>>>>>> current_group
> > > > > > >>>>>>>> &
> > > > > > >>>>>>>>>>>>>> parent_group
> > > > > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > > >>> used
> > > > > > >>>>> to
> > > > > > >>>>>>>> group
> > > > > > >>>>>>>>>>> tasks
> > > > > > >>>>>>>>>>>>> for
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > > >>>>> further
> > > > > > >>>>>>> to
> > > > > > >>>>>>>>>> group
> > > > > > >>>>>>>>>>>>>>> arbitrary
> > > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > > >>> allow
> > > > > > >>>>>>>>> group-level
> > > > > > >>>>>>>>>>>>>> operations
> > > > > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > > >>> the
> > > > > > >>>>>> dag)
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > > >> Proposed
> > > > > > >>>> UI
> > > > > > >>>>>>>>>> modification
> > > > > > >>>>>>>>>>>> to
> > > > > > >>>>>>>>>>>>>>> allow
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > > >>>> flat
> > > > > > >>>>>>>>> structure
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>>>> pair
> > > > > > >>>>>>>>>>>>>>> with
> > > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > > >>>>>>>>>>>>>>>>>>>>> first
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > > >>>>> hierarchical
> > > > > > >>>>>>>>>>> structure.
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > > >> PRs
> > > > > > >>>> for
> > > > > > >>>>>>>> details:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > > >>>>> aspects
> > > > > > >>>>>>>> that
> > > > > > >>>>>>>>>> you
> > > > > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > > >>> the
> > > > > > >>>>>> third
> > > > > > >>>>>>>>>> change
> > > > > > >>>>>>>>>>>>>>> regarding
> > > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > > >>>> looking
> > > > > > >>>>>>>> forward
> > > > > > >>>>>>>>>> to
> > > > > > >>>>>>>>>>>> it!
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>> --
> > > > > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>> --
> > > > > > >>>>>>>>>>>>>>> Thanks & Regards
> > > > > > >>>>>>>>>>>>>>> Poornima
> > > > > > >>>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>>
> > > > > > >>>>>>>>>>>>
> > > > > > >>>>>>>>>>>
> > > > > > >>>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> --
> > > > > > >>>>>>>
> > > > > > >>>>>>> Jarek Potiuk
> > > > > > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > > >>>>>>>
> > > > > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>> --
> > > > > > >>>>>
> > > > > > >>>>> Jarek Potiuk
> > > > > > >>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > > > >>>>>
> > > > > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > > >>>>> <+48%20660%20796%20129>>
> > > > > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > > >>>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> --
> > > > > > >>>>
> > > > > > >>>> *Jacob Ferriero*
> > > > > > >>>>
> > > > > > >>>> Strategic Cloud Engineer: Data Engineering
> > > > > > >>>>
> > > > > > >>>> jferriero@google.com
> > > > > > >>>>
> > > > > > >>>> 617-714-2509
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
Re graph times. That makes sense. Let me know what you find. We may be able
to contribute on the lazy loading part.

Looking forward to see the updated AIP!


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com> wrote:

> Permissions granted, let me know if you face any issues.
>
> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:
>
> > Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> >
> > On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > What's your ID i.e. if you haven't created an account yet, please
> create
> > > one at https://cwiki.apache.org/confluence/signup.action and send us
> > your
> > > ID and we will add permissions.
> > >
> > > Thanks. I'll edit the AIP. May I request permission to edit it?
> > > > My wiki user email is yuqian1990@gmail.com.
> > >
> > >
> > > On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com> wrote:
> > >
> > > > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to
> edit
> > > it?
> > > > My wiki user email is yuqian1990@gmail.com.
> > > >
> > > > Re Gerard: yes the UI loads all the nodes as json from the web server
> > at
> > > > once. However, it only adds the top level nodes and edges to the
> graph
> > > when
> > > > the Graph View page is first opened. And then adds the expanded nodes
> > to
> > > > the graph as the user expands them. From what I've experienced with
> > DAGs
> > > > containing around 400 tasks (not using TaskGroup or SubDagOperator),
> > > > opening the whole dag in Graph View usually takes 5 seconds. Less
> than
> > > 60ms
> > > > of that is taken by loading the data from webserver. The remaining
> > 4.9s+
> > > is
> > > > taken by javascript functions in dagre-d3.min.js such as createNodes,
> > > > createEdgeLabels, etc and by rendering the graph. With TaskGroup
> being
> > > used
> > > > to group tasks into a smaller number of top-level nodes, the amount
> of
> > > data
> > > > loaded from webserver will remain about the same compared to a flat
> dag
> > > of
> > > > the same size, but the number of nodes and edges needed to be plot on
> > the
> > > > graph can be reduced significantly. So in theory this should speed up
> > the
> > > > time it takes to open Graph View even without lazy-loading the data
> > (I'll
> > > > experiment to find out). That said, if it comes to a point
> lazy-loading
> > > > helps, we can still implement it as an improvement.
> > > >
> > > > Re James: the Tree View looks as if all all the groups are fully
> > > expanded.
> > > > (because under the hood all the tasks are in a single DAG). I'm less
> > > > worried about Tree View at the moment because it already has a
> > mechanism
> > > > for collapsing tasks by the dependency tree. That said, the Tree View
> > can
> > > > definitely be improved too with TaskGroup. (e.g. collapse tasks in
> the
> > > same
> > > > TaskGroup when Tree View is first opened).
> > > >
> > > > For both suggestions, implementing them don't require fundamental
> > changes
> > > > to the idea. I think we can have a basic working TaskGroup first, and
> > > then
> > > > improve it incrementally in several PRs as we get more feedback from
> > the
> > > > community. What do you think?
> > > >
> > > > Qian
> > > >
> > > >
> > > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> > wrote:
> > > >
> > > > > I agree this looks great, one question, how does the tree view
> look?
> > > > >
> > > > > James Coder
> > > > >
> > > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > gcasassaez@twitter.com
> > > > .invalid>
> > > > > wrote:
> > > > > >
> > > > > > First of all, this is awesome!!
> > > > > >
> > > > > > Secondly, checking your UI code, seems you are loading all
> > operators
> > > at
> > > > > > once. Wondering if we can load them as needed (aka load whenever
> we
> > > > click
> > > > > > the TaskGroup). Some of our DAGs are so large that take forever
> to
> > > load
> > > > > on
> > > > > > the Graph view, so worried about this still being an issue here.
> It
> > > may
> > > > > be
> > > > > > easily solvable by implementing lazy loading of the graph. Not
> sure
> > > how
> > > > > > easy to implement/add to the UI extension (and dont want to push
> > for
> > > > > early
> > > > > > optimization as its the root of all evil).
> > > > > > Gerard Casas Saez
> > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > >
> > > > > >
> > > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > bin.huangxb@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> Hi Yu,
> > > > > >>
> > > > > >> Thank you so much for taking on this. I was fairly distracted
> > > > previously
> > > > > >> and I didn't have the time to update the proposal. In fact,
> after
> > > > > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP
> > has
> > > > > been
> > > > > >> changed to favor the concept of TaskGroup instead of rewriting
> > > > > >> SubDagOperator (though it may may sense to deprecate SubDag in a
> > > > future
> > > > > >> date.).
> > > > > >>
> > > > > >> Your PR is amazing and it has implemented the desire features. I
> > > think
> > > > > we
> > > > > >> can focus on your new PR instead. Do you mind updating the AIP
> > based
> > > > on
> > > > > >> what you have done in your PR?
> > > > > >>
> > > > > >> Best,
> > > > > >> Bin
> > > > > >>
> > > > > >>
> > > > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com>
> > > > wrote:
> > > > > >>>
> > > > > >>> Hi, all, I've added the basic UI changes to my proposed
> > > > implementation
> > > > > of
> > > > > >>> TaskGroup as UI grouping concept:
> > > > > >>> https://github.com/apache/airflow/pull/10153
> > > > > >>>
> > > > > >>> I think Chris had a pretty good specification of TaskGroup so
> i'm
> > > > > quoting
> > > > > >>> it here. The only thing I don't fully agree with is the
> > restriction
> > > > > >>> "... **cannot*
> > > > > >>> have dependencies between a Task in a TaskGroup and either a*
> > > > > >>> *   Task in a different TaskGroup or a Task not in any
> group*". I
> > > > think
> > > > > >>> this is over restrictive. Since TaskGroup is a UI concept,
> tasks
> > > can
> > > > > have
> > > > > >>> dependencies on tasks in other TaskGroup or not in any
> TaskGroup.
> > > In
> > > > my
> > > > > >> PR,
> > > > > >>> this is allowed. The graph edges will update accordingly when
> > > > > TaskGroups
> > > > > >>> are expanded/collapsed. TaskGroup is only helping to make the
> UI
> > > look
> > > > > >> less
> > > > > >>> crowded. Under the hood, everything is still a DAG of tasks and
> > > edges
> > > > > so
> > > > > >>> things work normally. Here's a screenshot
> > > > > >>> <
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > >>>>
> > > > > >>> of the UI interaction.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > > dependencies
> > > > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> > > dependencies
> > > > > >>> between a Task in a TaskGroup and either a   Task in a
> different
> > > > > >> TaskGroup
> > > > > >>> or a Task not in any group   - You *can* have dependencies
> > between
> > > a
> > > > > >>> TaskGroup and either other   TaskGroups or Tasks not in any
> group
> > >  -
> > > > > The
> > > > > >>> UI will by default render a TaskGroup as a single "object", but
> > > >  which
> > > > > >> you
> > > > > >>> expand or zoom into in some way   - You'd need some way to
> > > determine
> > > > > what
> > > > > >>> the "status" of a TaskGroup was   at least for UI display
> > purposes*
> > > > > >>>
> > > > > >>>
> > > > > >>> Regarding Jake's comment, I agree it's possible to implement
> the
> > > > > >> "retrying
> > > > > >>> tasks in a group" pattern he mentioned as an optional feature
> of
> > > > > >> TaskGroup
> > > > > >>> although that may go against having TaskGroup as a pure UI
> > concept.
> > > > For
> > > > > >> the
> > > > > >>> motivating example Jake provided, I suggest implementing both
> > > > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> > > > operator.
> > > > > It
> > > > > >>> can do something like BaseSensorOperator.execute() does in
> > > > "reschedule"
> > > > > >>> mode, i.e. it first executes some code to submit the long
> running
> > > job
> > > > > to
> > > > > >>> the external service, and store the state (e.g. in XCom). Then
> > > > > reschedule
> > > > > >>> itself. Subsequent runs then pokes for the completion state.
> > > > > >>>
> > > > > >>>
> > > > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > >> <jferriero@google.com.invalid
> > > > > >>>>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> I really like this idea of a TaskGroup container as I think
> this
> > > > will
> > > > > >> be
> > > > > >>>> much easier to use than SubDag.
> > > > > >>>>
> > > > > >>>> I'd like to propose an optional behavior for special retry
> > > mechanics
> > > > > >> via
> > > > > >>> a
> > > > > >>>> TaskGroup.retry_all property.
> > > > > >>>> This way I could use TaskGroup to replace my favorite use of
> > > SubDag
> > > > > for
> > > > > >>>> atomically retrying tasks of the pattern "act on external
> state
> > > then
> > > > > >>>> reschedule poll until desired state reached".
> > > > > >>>>
> > > > > >>>> Motivating use case I have for a SubDag is very simple two
> task
> > > > group
> > > > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > >>>> I use SubDag is because it gives me an easy way to retry the
> > > > > >>> SubmitJobTask
> > > > > >>>> if something about the PollJobSensor fails.
> > > > > >>>> This pattern would be really nice for jobs that are expected
> to
> > > run
> > > > a
> > > > > >>> long
> > > > > >>>> time (because we can use sensor can use reschedule mode
> freeing
> > up
> > > > > >> slots)
> > > > > >>>> but might fail for a retryable reason.
> > > > > >>>> However, using SubDag to meet this use case defeats the
> purpose
> > > > > because
> > > > > >>>> SubDag infamously
> > > > > >>>> <
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > >>>>>
> > > > > >>>> blocks a "controller" slot for the entire duration.
> > > > > >>>> This may feel like a cyclic behavior but reality it is very
> > common
> > > > for
> > > > > >> a
> > > > > >>>> single operator to submit job / wait til done.
> > > > > >>>> We could use this case refactor many operators (e.g. BQ,
> > Dataproc,
> > > > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> PollTask]
> > > > with
> > > > > >> an
> > > > > >>>> optional reschedule mode if user knows that this job may take
> a
> > > long
> > > > > >>> time.
> > > > > >>>>
> > > > > >>>> I'd be happy to the development work on adding this specific
> > retry
> > > > > >>> behavior
> > > > > >>>> to TaskGroup once the base concept is implemented if others in
> > the
> > > > > >>>> community would find this a useful feature.
> > > > > >>>>
> > > > > >>>> Cheers,
> > > > > >>>> Jake
> > > > > >>>>
> > > > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com
> > > > > >>>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>>> All for it :) . I think we are getting closer to have regular
> > > > > >> planning
> > > > > >>>> and
> > > > > >>>>> making some structured approach to 2.0 and starting task
> force
> > > for
> > > > it
> > > > > >>>> soon,
> > > > > >>>>> so I think this should be perfectly fine to discuss and even
> > > start
> > > > > >>>>> implementing what's beyond as soon as we make sure that we
> are
> > > > > >>>> prioritizing
> > > > > >>>>> 2.0 work.
> > > > > >>>>>
> > > > > >>>>> J,
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> yuqian1990@gmail.com>
> > > > > >> wrote:
> > > > > >>>>>
> > > > > >>>>>> Hi Jarek,
> > > > > >>>>>>
> > > > > >>>>>> I agree we should not change the behaviour of the existing
> > > > > >>>> SubDagOperator
> > > > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion
> about
> > > > > >>> TaskGroup
> > > > > >>>>> as
> > > > > >>>>>> a brand new concept/feature independent from the existing
> > > > > >>>> SubDagOperator?
> > > > > >>>>>> In other words, shall we add TaskGroup as a UI grouping
> > concept
> > > > > >> like
> > > > > >>>> Ash
> > > > > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we
> > are
> > > > > >>> ready
> > > > > >>>>> with
> > > > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > > > > >>>>>>
> > > > > >>>>>> I really like Ash's idea of simplifying the SubDagOperator
> > idea
> > > > > >> into
> > > > > >>> a
> > > > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > "reattaching
> > > > > >> all
> > > > > >>>> the
> > > > > >>>>>> tasks to the root DAG" is the way to go. And I see James
> > pointed
> > > > > >> out
> > > > > >>> we
> > > > > >>>>>> need some helper functions to simplify dependencies setting
> of
> > > > > >>>> TaskGroup.
> > > > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> having
> > > > > >>>> TaskGroup
> > > > > >>>>> as
> > > > > >>>>>> a UI concept should be a relatively small change. We can
> > > simplify
> > > > > >>>>> Xinbin's
> > > > > >>>>>> PR further. So I put up this alternative proposal here:
> > > > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > > > >>>>>>
> > > > > >>>>>> I have not done any UI changes due to lack of experience
> with
> > > web
> > > > > >> UI.
> > > > > >>>> If
> > > > > >>>>>> anyone's interested, please take a look at the PR.
> > > > > >>>>>>
> > > > > >>>>>> Qian
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > >>> Jarek.Potiuk@polidea.com
> > > > > >>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Similar point here to the other ideas that are popping up.
> > > Maybe
> > > > > >> we
> > > > > >>>>>> should
> > > > > >>>>>>> just focus on completing 2.0 and make all discussions about
> > > > > >> further
> > > > > >>>>>>> improvements to 2.1? While those are important discussions
> > (and
> > > > > >> we
> > > > > >>>>> should
> > > > > >>>>>>> continue them in the  near future !) I think at this point
> > > > > >> focusing
> > > > > >>>> on
> > > > > >>>>>>> delivering 2.0 in its current shape should be our focus
> now ?
> > > > > >>>>>>>
> > > > > >>>>>>> J.
> > > > > >>>>>>>
> > > > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > >>> bin.huangxb@gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi Daniel
> > > > > >>>>>>>>
> > > > > >>>>>>>> I agree that the TaskGroup should have the same API as a
> DAG
> > > > > >>> object
> > > > > >>>>>>> related
> > > > > >>>>>>>> to task dependencies, but it will not have anything
> related
> > to
> > > > > >>>> actual
> > > > > >>>>>>>> execution or scheduling.
> > > > > >>>>>>>> I will update the AIP according to this over the weekend.
> > > > > >>>>>>>>
> > > > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > > > > >> import
> > > > > >>>> the
> > > > > >>>>>>> object
> > > > > >>>>>>>> you can import it with parameters to determine the shape
> of
> > > the
> > > > > >>>> DAG.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > similar
> > > > > >>>> purpose
> > > > > >>>>>> as
> > > > > >>>>>>> a
> > > > > >>>>>>>> DAG factory function?
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > >>>>>>> daniel.imberman@gmail.com
> > > > > >>>>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object
> > (e.g.
> > > > > >>> the
> > > > > >>>>>>> bitwise
> > > > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > > > >>>> “DAGTemplate”
> > > > > >>>>>>>> object
> > > > > >>>>>>>>> s.t. when you import the object you can import it with
> > > > > >>> parameters
> > > > > >>>>> to
> > > > > >>>>>>>>> determine the shape of the DAG.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > >>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>
> > > > > >>>>>>>>> wrote:
> > > > > >>>>>>>>> The TaskGroup will not take schedule interval as a
> > parameter
> > > > > >>>>> itself,
> > > > > >>>>>>> and
> > > > > >>>>>>>> it
> > > > > >>>>>>>>> depends on the DAG where it attaches to. In my opinion,
> the
> > > > > >>>>> TaskGroup
> > > > > >>>>>>>> will
> > > > > >>>>>>>>> only contain a group of tasks with interdependencies, and
> > the
> > > > > >>>>>> TaskGroup
> > > > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > > > >>> execution/scheduling
> > > > > >>>>>> logic
> > > > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> etc.)
> > > > > >>> like
> > > > > >>>> a
> > > > > >>>>>> DAG
> > > > > >>>>>>>>> does.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > > > >> interval
> > > > > >>>> of
> > > > > >>>>>> DAG
> > > > > >>>>>>> is
> > > > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I am curious why you ask this. Is this a use case that
> you
> > > > > >> want
> > > > > >>>> to
> > > > > >>>>>>>> achieve?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Bin
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > >> thanosxnicholas@gmail.com
> > > > > >>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup
> the
> > > > > >>> same
> > > > > >>>>> as
> > > > > >>>>>>> the
> > > > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > > > >> interval
> > > > > >>> of
> > > > > >>>>>>>> TaskGroup
> > > > > >>>>>>>>>> could be different with that of the DAG? For example,
> > there
> > > > > >>> is
> > > > > >>>>> the
> > > > > >>>>>>>>> scenario
> > > > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > > > >> schedule
> > > > > >>>>>> interval
> > > > > >>>>>>>> of
> > > > > >>>>>>>>>> TaskGroup is 20 min.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Cheers,
> > > > > >>>>>>>>>> Nicholas
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > >>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi Nicholas,
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> > > > > >>> maybe
> > > > > >>>>> it
> > > > > >>>>>>> will
> > > > > >>>>>>>>>> throw
> > > > > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > > > > >>>>>>>> schedule_interval
> > > > > >>>>>>>>>> will
> > > > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> > > > > >>>> SubDag,
> > > > > >>>>>>> there
> > > > > >>>>>>>>>> will
> > > > > >>>>>>>>>>> be no subdag schedule_interval.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > >>>> thanosxnicholas@gmail.com
> > > > > >>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> > > > > >> the
> > > > > >>>>>>> schedule
> > > > > >>>>>>>>>>>> interval of SubDAG is different from that of the
> parent
> > > > > >>>> DAG?
> > > > > >>>>> I
> > > > > >>>>>>> have
> > > > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> interval
> > > > > >>> of
> > > > > >>>>>>> SubDAG.
> > > > > >>>>>>>> If
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> > > > > >>> will
> > > > > >>>>>>> happen
> > > > > >>>>>>>>> for
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Regards,
> > > > > >>>>>>>>>>>> Nicholas Jiang
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > >>>>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> > > > > >>>>>> groups. I
> > > > > >>>>>>>>> think
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > > > >>> subdag
> > > > > >>>>> and
> > > > > >>>>>>>>>> introduce
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > > > > >>> along
> > > > > >>>>>> with
> > > > > >>>>>>>>> their
> > > > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> > > > > >>>> DAG*.
> > > > > >>>>>> The
> > > > > >>>>>>>>> only
> > > > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > > > > >>> still
> > > > > >>>>> need
> > > > > >>>>>>> to
> > > > > >>>>>>>>> add
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>> a DAG for execution.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Here is a small code snippet.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>> class TaskGroup:
> > > > > >>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> If default_args is missing, it will take default args
> > > > > >>>> from
> > > > > >>>>>> the
> > > > > >>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > >>>>>>>>>>>>> pass
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > > > > >>> tasks
> > > > > >>>>> to
> > > > > >>>>>> a
> > > > > >>>>>>>> DAG
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> This can be declared in a separate file from the dag
> > > > > >>> file
> > > > > >>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > > >>>>>>>>>>>> default_args=default_args)
> > > > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > > > >>>>>>>>>>>>> task2.dag = download_group
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> with download_group:
> > > > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> [task, task2] >> task3
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > >>>>>>> default_args=default_args,
> > > > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > >>>>>>>>>>>>> start >> download_group
> > > > > >>>>>>>>>>>>> # this is equivalent to
> > > > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > >>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > > > > >> set
> > > > > >>>>>>>> dependencies
> > > > > >>>>>>>>>>>> between
> > > > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > > >>>>>> SubDagOperator,
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > > >> Airflow
> > > > > >>>> 2.0
> > > > > >>>>>> and
> > > > > >>>>>>>>> allow
> > > > > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > > > > >> want
> > > > > >>>> to
> > > > > >>>>>> keep
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Any thoughts?
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > > > >>> groups
> > > > > >>>>> and
> > > > > >>>>>> a
> > > > > >>>>>>>>>>>> zoom-in/out
> > > > > >>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> > > > > >>>>> object
> > > > > >>>>>>>> since
> > > > > >>>>>>>>> it
> > > > > >>>>>>>>>>> is
> > > > > >>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > > > >>> create
> > > > > >>>>>>>> underlying
> > > > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > > > >> group
> > > > > >>>> of
> > > > > >>>>>>> tasks.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Max
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Thank you for your email.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > > > >>>>>> rewrites
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > > >> it
> > > > > >>>>> will
> > > > > >>>>>>>> give a
> > > > > >>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > >> does
> > > > > >>>>> this I
> > > > > >>>>>>>>> think.
> > > > > >>>>>>>>>> At
> > > > > >>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > >>> representation,
> > > > > >>>>> but
> > > > > >>>>>> at
> > > > > >>>>>>>>> least
> > > > > >>>>>>>>>>> it
> > > > > >>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > > > >> In
> > > > > >>> my
> > > > > >>>>>>>> proposal
> > > > > >>>>>>>>> as
> > > > > >>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > > >> from
> > > > > >>>> the
> > > > > >>>>>>> subdag
> > > > > >>>>>>>>> and
> > > > > >>>>>>>>>>> add
> > > > > >>>>>>>>>>>>>> them
> > > > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > > > > >>>> will
> > > > > >>>>>> look
> > > > > >>>>>>>>>> exactly
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > >> attached
> > > > > >>>> to
> > > > > >>>>>>> those
> > > > > >>>>>>>>>>>> sections.
> > > > > >>>>>>>>>>>>>>> These
> > > > > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > > > > >>> UI.
> > > > > >>>>> So
> > > > > >>>>>>>> after
> > > > > >>>>>>>>>>>> parsing
> > > > > >>>>>>>>>>>>> (
> > > > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > > > >> the
> > > > > >>>>>>> *root_dag
> > > > > >>>>>>>>>>>> *instead
> > > > > >>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> *root_dag +
> > > > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > >>>>>>>>>> current_group=section-1,
> > > > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > > >>> naming
> > > > > >>>>>>>>>>> suggestions),
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > > >>> nested
> > > > > >>>>>> group
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > > > > >>>> like
> > > > > >>>>>> this
> > > > > >>>>>>>> by
> > > > > >>>>>>>>>>>>> utilizing
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > > > >> in
> > > > > >>>> some
> > > > > >>>>>>> way.
> > > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > >>> complexity
> > > > > >>>> of
> > > > > >>>>>>>> SubDag
> > > > > >>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>> execution
> > > > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > >> using
> > > > > >>>>>> SubDag.
> > > > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > > > >>>>> reusable
> > > > > >>>>>>> dag
> > > > > >>>>>>>>> code
> > > > > >>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > > > > >>> new
> > > > > >>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>> (see
> > > > > >>>>>>>>>>>>>>> AIP
> > > > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > > > >>>>> function
> > > > > >>>>>>> for
> > > > > >>>>>>>>>>>>> generating 1
> > > > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > > > >>> (in
> > > > > >>>>> this
> > > > > >>>>>>>> case,
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > > > >>> root
> > > > > >>>>>> dag).
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > > > >>>> with a
> > > > > >>>>>>>>>> simpler
> > > > > >>>>>>>>>>>>>> concept
> > > > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > > >> out
> > > > > >>>> the
> > > > > >>>>>>>>>> contents
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>> SubDag
> > > > > >>>>>>>>>>>>>>>> and becomes more like
> > > > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > >>>>>>>>>>>>>>> (forgive
> > > > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > > > > >>>> still
> > > > > >>>>>>>>>>> necessary
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> keep the
> > > > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > > > >>>> name?
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > > > >>>> Chris
> > > > > >>>>>>> Palmer
> > > > > >>>>>>>>> for
> > > > > >>>>>>>>>>>>> helping
> > > > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > > > > >>>> will
> > > > > >>>>>> just
> > > > > >>>>>>>>> paste
> > > > > >>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>> here.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > > >> in
> > > > > >>>> the
> > > > > >>>>>> same
> > > > > >>>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > > > >> a
> > > > > >>>>>>> TaskGroup
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> either a
> > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > > >> in
> > > > > >>>> any
> > > > > >>>>>>> group
> > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>> TaskGroup
> > > > > >>>>> and
> > > > > >>>>>>>>>> either
> > > > > >>>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > >> as
> > > > > >>> a
> > > > > >>>>>> single
> > > > > >>>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > >>>>> "status"
> > > > > >>>>>>> of a
> > > > > >>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > >>> executor), I
> > > > > >>>>>> think
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > > > >> to
> > > > > >>>>>>> implement
> > > > > >>>>>>>>>> some
> > > > > >>>>>>>>>>>>>> metadata
> > > > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > > >>> tasks
> > > > > >>>>>> etc.)
> > > > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > > > > >>> up
> > > > > >>>>> the
> > > > > >>>>>>>>>> individual
> > > > > >>>>>>>>>>>>>> tasks'
> > > > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > >> status
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > >> Imberman
> > > > > >>> <
> > > > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > > > > >>> to
> > > > > >>>>> tie
> > > > > >>>>>>> dags
> > > > > >>>>>>>>>>>> together
> > > > > >>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > > > > >>>> could
> > > > > >>>>>>>>>> essentially
> > > > > >>>>>>>>>>>>> write
> > > > > >>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > >>>> starter-tasks
> > > > > >>>>>> for
> > > > > >>>>>>>>> that
> > > > > >>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > > > >> UI
> > > > > >>>>>> concept.
> > > > > >>>>>>>> It
> > > > > >>>>>>>>>>>> doesn’t
> > > > > >>>>>>>>>>>>>> need
> > > > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > > > > >>>> tasks
> > > > > >>>>>> to
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > >> available.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > >>>>>>>>>>>>>>>>> ]
> > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > > > >> <
> > > > > >>>>>>>>>>> chris@crpalmer.com
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > > >>>>>> abstraction.
> > > > > >>>>>>> I
> > > > > >>>>>>>>>> think
> > > > > >>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > > >> high
> > > > > >>>>> level
> > > > > >>>>>> I
> > > > > >>>>>>>>> think
> > > > > >>>>>>>>>>> you
> > > > > >>>>>>>>>>>>> want
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>> functionality:
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > > > > >>> the
> > > > > >>>>>> same
> > > > > >>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > > > > >>>>>> TaskGroup
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > > > > >>> any
> > > > > >>>>>> group
> > > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>> TaskGroup
> > > > > >>>>> and
> > > > > >>>>>>>> either
> > > > > >>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > > >> as a
> > > > > >>>>>> single
> > > > > >>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > > >>>> "status"
> > > > > >>>>>> of
> > > > > >>>>>>> a
> > > > > >>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > > >>> object
> > > > > >>>>>> with
> > > > > >>>>>>>> its
> > > > > >>>>>>>>>> own
> > > > > >>>>>>>>>>>>>> database
> > > > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > > > >>>> tasks.
> > > > > >>>>> I
> > > > > >>>>>>>> think
> > > > > >>>>>>>>>> you
> > > > > >>>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>> build
> > > > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > >> point
> > > > > >>> of
> > > > > >>>>>> view
> > > > > >>>>>>> a
> > > > > >>>>>>>>> DAG
> > > > > >>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > >> differently.
> > > > > >>> So
> > > > > >>>>> it
> > > > > >>>>>>>> really
> > > > > >>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>> becomes
> > > > > >>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > > > > >>> of
> > > > > >>>>>> Tasks,
> > > > > >>>>>>>> and
> > > > > >>>>>>>>>>>> allows
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Chris
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > > >> the
> > > > > >>>> more
> > > > > >>>>>>>>> important
> > > > > >>>>>>>>>>>> issue
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> fix),
> > > > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > > >>> right
> > > > > >>>>> way
> > > > > >>>>>>>>> forward
> > > > > >>>>>>>>>>>> (just
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> might
> > > > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > > >>> adding
> > > > > >>>>>>> visual
> > > > > >>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> UI).
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > > >>> with
> > > > > >>>>> more
> > > > > >>>>>>>>> context
> > > > > >>>>>>>>>>> on
> > > > > >>>>>>>>>>>>> why
> > > > > >>>>>>>>>>>>>>>>> subdags
> > > > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>
> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > >>>>>>>>>>>>>> . A
> > > > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > > >> is
> > > > > >>>> e.g.
> > > > > >>>>>>>>> enabling
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> operator
> > > > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > > >>>> well. I
> > > > > >>>>>> see
> > > > > >>>>>>>>> this
> > > > > >>>>>>>>>>>> being
> > > > > >>>>>>>>>>>>>>>>> separate
> > > > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > > >> UI
> > > > > >>>> but
> > > > > >>>>>> one
> > > > > >>>>>>> of
> > > > > >>>>>>>>> the
> > > > > >>>>>>>>>>> two
> > > > > >>>>>>>>>>>>>> items
> > > > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > >>>>>> functionality.
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > > >> and
> > > > > >>>>> they
> > > > > >>>>>>> are
> > > > > >>>>>>>>>>> always a
> > > > > >>>>>>>>>>>>>> giant
> > > > > >>>>>>>>>>>>>>>>> pain
> > > > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > > >>>>> confusion
> > > > > >>>>>>> and
> > > > > >>>>>>>>>>>> breakages
> > > > > >>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > >> Coder <
> > > > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > > >> UI
> > > > > >>>>>>> concept. I
> > > > > >>>>>>>>> use
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > > >>> you
> > > > > >>>>>> have a
> > > > > >>>>>>>>> group
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > >> tasks
> > > > > >>>>>> start,
> > > > > >>>>>>>>> using
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > > > >>>> and I
> > > > > >>>>>>> think
> > > > > >>>>>>>>>> also
> > > > > >>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>> easier
> > > > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > >> Hamlin
> > > > > >>> <
> > > > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > >>>>>> Berlin-Taylor
> > > > > >>>>>>> <
> > > > > >>>>>>>>>>>>>> ash@apache.org
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > >>>> anymore?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > >>>>> replacing
> > > > > >>>>>> it
> > > > > >>>>>>>>> with
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > >> to
> > > > > >>>> get
> > > > > >>>>>>>> wrong,
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>>>> closer
> > > > > >>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > >>>> subdags?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > >>>> subdags
> > > > > >>>>>>> could
> > > > > >>>>>>>>>> start
> > > > > >>>>>>>>>>>>>> running
> > > > > >>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > >> we
> > > > > >>>> not
> > > > > >>>>>>> also
> > > > > >>>>>>>>> just
> > > > > >>>>>>>>>>>>>>> _enitrely_
> > > > > >>>>>>>>>>>>>>>>>>> remove
> > > > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > >> it
> > > > > >>>> with
> > > > > >>>>>>>>> something
> > > > > >>>>>>>>>>>>>> simpler.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > >>> haven't
> > > > > >>>>> used
> > > > > >>>>>>>> them
> > > > > >>>>>>>>>>>>>> extensively
> > > > > >>>>>>>>>>>>>>> so
> > > > > >>>>>>>>>>>>>>>>>> may
> > > > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > >>>> has(?)
> > > > > >>>>> to
> > > > > >>>>>>> be
> > > > > >>>>>>>> of
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>> form
> > > > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > >> schedule_interval,
> > > > > >>>> but
> > > > > >>>>>> it
> > > > > >>>>>>>> has
> > > > > >>>>>>>>> to
> > > > > >>>>>>>>>>>> match
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> parent
> > > > > >>>>>>>>>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > > >>>> (Does
> > > > > >>>>>> it
> > > > > >>>>>>>> make
> > > > > >>>>>>>>>>> sense
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> do
> > > > > >>>>>>>>>>>>>>>>>> this?
> > > > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > >>> sub
> > > > > >>>>> dag
> > > > > >>>>>>>> would
> > > > > >>>>>>>>>>> never
> > > > > >>>>>>>>>>>>>>>>> execute, so
> > > > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > >>>>> operator a
> > > > > >>>>>>>>> subdag
> > > > > >>>>>>>>>>> with
> > > > > >>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>> always
> > > > > >>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > >>>>>> Berlin-Taylor <
> > > > > >>>>>>>>>>>>>> ash@apache.org>
> > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > >>>>> excited
> > > > > >>>>>> to
> > > > > >>>>>>>> see
> > > > > >>>>>>>>>> how
> > > > > >>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>> progresses.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>> parsing*:
> > > > > >>>>> This
> > > > > >>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>> parsing,
> > > > > >>>>> and
> > > > > >>>>>> it
> > > > > >>>>>>>>> will
> > > > > >>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > >>>> already
> > > > > >>>>>> does
> > > > > >>>>>>>>> this
> > > > > >>>>>>>>>> I
> > > > > >>>>>>>>>>>>> think.
> > > > > >>>>>>>>>>>>>>> At
> > > > > >>>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > >>>> correctly.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > >>>> Huang <
> > > > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > >>>> collect
> > > > > >>>>>>>>> feedback
> > > > > >>>>>>>>>> on
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>> AIP-34
> > > > > >>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > >>>>>> previously
> > > > > >>>>>>>>>> briefly
> > > > > >>>>>>>>>>>>>>>>> mentioned in
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > >>> done
> > > > > >>>>> for
> > > > > >>>>>>>>> Airflow
> > > > > >>>>>>>>>>> 2.0,
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>> one of
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > >>> attach
> > > > > >>>>>> tasks
> > > > > >>>>>>>> back
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> root
> > > > > >>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > >>>>>> SubDagOperator
> > > > > >>>>>>>>>> related
> > > > > >>>>>>>>>>>>>> issues
> > > > > >>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > >> while
> > > > > >>>>>>> respecting
> > > > > >>>>>>>>>>>>>> dependencies
> > > > > >>>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > >> effect
> > > > > >>>> on
> > > > > >>>>>> the
> > > > > >>>>>>> UI
> > > > > >>>>>>>>>> will
> > > > > >>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>> achieved
> > > > > >>>>>>>>>>>>>>>>>>>> through
> > > > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > >>>> function
> > > > > >>>>>> more
> > > > > >>>>>>>>>>> reusable
> > > > > >>>>>>>>>>>>>>> because
> > > > > >>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>> don't
> > > > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > >>>>>>> child_dag_name
> > > > > >>>>>>>>> in
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>>>>>>>> signature
> > > > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>> parsing*:
> > > > > >>>>> This
> > > > > >>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>> parsing,
> > > > > >>>>> and
> > > > > >>>>>> it
> > > > > >>>>>>>>> will
> > > > > >>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > >> new
> > > > > >>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>> acts
> > > > > >>>>>>>>>>>>>>> like a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > >>>>> methods
> > > > > >>>>>>> are
> > > > > >>>>>>>>>>> removed.
> > > > > >>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > >> *with
> > > > > >>>>>>>>> *subdag_args
> > > > > >>>>>>>>>>> *and
> > > > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > >> PythonOperator
> > > > > >>>>>>>> signature.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > >>>>>>> current_group
> > > > > >>>>>>>> &
> > > > > >>>>>>>>>>>>>> parent_group
> > > > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > >>> used
> > > > > >>>>> to
> > > > > >>>>>>>> group
> > > > > >>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > >>>>> further
> > > > > >>>>>>> to
> > > > > >>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>> arbitrary
> > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > >>> allow
> > > > > >>>>>>>>> group-level
> > > > > >>>>>>>>>>>>>> operations
> > > > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > >>> the
> > > > > >>>>>> dag)
> > > > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > >> Proposed
> > > > > >>>> UI
> > > > > >>>>>>>>>> modification
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > >>>> flat
> > > > > >>>>>>>>> structure
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>> pair
> > > > > >>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> first
> > > > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > >>>>> hierarchical
> > > > > >>>>>>>>>>> structure.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > >> PRs
> > > > > >>>> for
> > > > > >>>>>>>> details:
> > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > >>>>> aspects
> > > > > >>>>>>>> that
> > > > > >>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > >>> the
> > > > > >>>>>> third
> > > > > >>>>>>>>>> change
> > > > > >>>>>>>>>>>>>>> regarding
> > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > >>>> looking
> > > > > >>>>>>>> forward
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>> it!
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>> Thanks & Regards
> > > > > >>>>>>>>>>>>>>> Poornima
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>>
> > > > > >>>>>>> Jarek Potiuk
> > > > > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > > >>>>>>>
> > > > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > >>>>> <+48%20660%20796%20129>>
> > > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> --
> > > > > >>>>>
> > > > > >>>>> Jarek Potiuk
> > > > > >>>>> Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > > > >>>>>
> > > > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > > >>>>> <+48%20660%20796%20129>>
> > > > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> --
> > > > > >>>>
> > > > > >>>> *Jacob Ferriero*
> > > > > >>>>
> > > > > >>>> Strategic Cloud Engineer: Data Engineering
> > > > > >>>>
> > > > > >>>> jferriero@google.com
> > > > > >>>>
> > > > > >>>> 617-714-2509
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Kaxil Naik <ka...@gmail.com>.
Permissions granted, let me know if you face any issues.

On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com> wrote:

> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
>
> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > What's your ID i.e. if you haven't created an account yet, please create
> > one at https://cwiki.apache.org/confluence/signup.action and send us
> your
> > ID and we will add permissions.
> >
> > Thanks. I'll edit the AIP. May I request permission to edit it?
> > > My wiki user email is yuqian1990@gmail.com.
> >
> >
> > On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com> wrote:
> >
> > > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit
> > it?
> > > My wiki user email is yuqian1990@gmail.com.
> > >
> > > Re Gerard: yes the UI loads all the nodes as json from the web server
> at
> > > once. However, it only adds the top level nodes and edges to the graph
> > when
> > > the Graph View page is first opened. And then adds the expanded nodes
> to
> > > the graph as the user expands them. From what I've experienced with
> DAGs
> > > containing around 400 tasks (not using TaskGroup or SubDagOperator),
> > > opening the whole dag in Graph View usually takes 5 seconds. Less than
> > 60ms
> > > of that is taken by loading the data from webserver. The remaining
> 4.9s+
> > is
> > > taken by javascript functions in dagre-d3.min.js such as createNodes,
> > > createEdgeLabels, etc and by rendering the graph. With TaskGroup being
> > used
> > > to group tasks into a smaller number of top-level nodes, the amount of
> > data
> > > loaded from webserver will remain about the same compared to a flat dag
> > of
> > > the same size, but the number of nodes and edges needed to be plot on
> the
> > > graph can be reduced significantly. So in theory this should speed up
> the
> > > time it takes to open Graph View even without lazy-loading the data
> (I'll
> > > experiment to find out). That said, if it comes to a point lazy-loading
> > > helps, we can still implement it as an improvement.
> > >
> > > Re James: the Tree View looks as if all all the groups are fully
> > expanded.
> > > (because under the hood all the tasks are in a single DAG). I'm less
> > > worried about Tree View at the moment because it already has a
> mechanism
> > > for collapsing tasks by the dependency tree. That said, the Tree View
> can
> > > definitely be improved too with TaskGroup. (e.g. collapse tasks in the
> > same
> > > TaskGroup when Tree View is first opened).
> > >
> > > For both suggestions, implementing them don't require fundamental
> changes
> > > to the idea. I think we can have a basic working TaskGroup first, and
> > then
> > > improve it incrementally in several PRs as we get more feedback from
> the
> > > community. What do you think?
> > >
> > > Qian
> > >
> > >
> > > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com>
> wrote:
> > >
> > > > I agree this looks great, one question, how does the tree view look?
> > > >
> > > > James Coder
> > > >
> > > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > gcasassaez@twitter.com
> > > .invalid>
> > > > wrote:
> > > > >
> > > > > First of all, this is awesome!!
> > > > >
> > > > > Secondly, checking your UI code, seems you are loading all
> operators
> > at
> > > > > once. Wondering if we can load them as needed (aka load whenever we
> > > click
> > > > > the TaskGroup). Some of our DAGs are so large that take forever to
> > load
> > > > on
> > > > > the Graph view, so worried about this still being an issue here. It
> > may
> > > > be
> > > > > easily solvable by implementing lazy loading of the graph. Not sure
> > how
> > > > > easy to implement/add to the UI extension (and dont want to push
> for
> > > > early
> > > > > optimization as its the root of all evil).
> > > > > Gerard Casas Saez
> > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > >
> > > > >
> > > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > bin.huangxb@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Hi Yu,
> > > > >>
> > > > >> Thank you so much for taking on this. I was fairly distracted
> > > previously
> > > > >> and I didn't have the time to update the proposal. In fact, after
> > > > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP
> has
> > > > been
> > > > >> changed to favor the concept of TaskGroup instead of rewriting
> > > > >> SubDagOperator (though it may may sense to deprecate SubDag in a
> > > future
> > > > >> date.).
> > > > >>
> > > > >> Your PR is amazing and it has implemented the desire features. I
> > think
> > > > we
> > > > >> can focus on your new PR instead. Do you mind updating the AIP
> based
> > > on
> > > > >> what you have done in your PR?
> > > > >>
> > > > >> Best,
> > > > >> Bin
> > > > >>
> > > > >>
> > > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com>
> > > wrote:
> > > > >>>
> > > > >>> Hi, all, I've added the basic UI changes to my proposed
> > > implementation
> > > > of
> > > > >>> TaskGroup as UI grouping concept:
> > > > >>> https://github.com/apache/airflow/pull/10153
> > > > >>>
> > > > >>> I think Chris had a pretty good specification of TaskGroup so i'm
> > > > quoting
> > > > >>> it here. The only thing I don't fully agree with is the
> restriction
> > > > >>> "... **cannot*
> > > > >>> have dependencies between a Task in a TaskGroup and either a*
> > > > >>> *   Task in a different TaskGroup or a Task not in any group*". I
> > > think
> > > > >>> this is over restrictive. Since TaskGroup is a UI concept, tasks
> > can
> > > > have
> > > > >>> dependencies on tasks in other TaskGroup or not in any TaskGroup.
> > In
> > > my
> > > > >> PR,
> > > > >>> this is allowed. The graph edges will update accordingly when
> > > > TaskGroups
> > > > >>> are expanded/collapsed. TaskGroup is only helping to make the UI
> > look
> > > > >> less
> > > > >>> crowded. Under the hood, everything is still a DAG of tasks and
> > edges
> > > > so
> > > > >>> things work normally. Here's a screenshot
> > > > >>> <
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > >>>>
> > > > >>> of the UI interaction.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > dependencies
> > > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> > dependencies
> > > > >>> between a Task in a TaskGroup and either a   Task in a different
> > > > >> TaskGroup
> > > > >>> or a Task not in any group   - You *can* have dependencies
> between
> > a
> > > > >>> TaskGroup and either other   TaskGroups or Tasks not in any group
> >  -
> > > > The
> > > > >>> UI will by default render a TaskGroup as a single "object", but
> > >  which
> > > > >> you
> > > > >>> expand or zoom into in some way   - You'd need some way to
> > determine
> > > > what
> > > > >>> the "status" of a TaskGroup was   at least for UI display
> purposes*
> > > > >>>
> > > > >>>
> > > > >>> Regarding Jake's comment, I agree it's possible to implement the
> > > > >> "retrying
> > > > >>> tasks in a group" pattern he mentioned as an optional feature of
> > > > >> TaskGroup
> > > > >>> although that may go against having TaskGroup as a pure UI
> concept.
> > > For
> > > > >> the
> > > > >>> motivating example Jake provided, I suggest implementing both
> > > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> > > operator.
> > > > It
> > > > >>> can do something like BaseSensorOperator.execute() does in
> > > "reschedule"
> > > > >>> mode, i.e. it first executes some code to submit the long running
> > job
> > > > to
> > > > >>> the external service, and store the state (e.g. in XCom). Then
> > > > reschedule
> > > > >>> itself. Subsequent runs then pokes for the completion state.
> > > > >>>
> > > > >>>
> > > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > >> <jferriero@google.com.invalid
> > > > >>>>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> I really like this idea of a TaskGroup container as I think this
> > > will
> > > > >> be
> > > > >>>> much easier to use than SubDag.
> > > > >>>>
> > > > >>>> I'd like to propose an optional behavior for special retry
> > mechanics
> > > > >> via
> > > > >>> a
> > > > >>>> TaskGroup.retry_all property.
> > > > >>>> This way I could use TaskGroup to replace my favorite use of
> > SubDag
> > > > for
> > > > >>>> atomically retrying tasks of the pattern "act on external state
> > then
> > > > >>>> reschedule poll until desired state reached".
> > > > >>>>
> > > > >>>> Motivating use case I have for a SubDag is very simple two task
> > > group
> > > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > >>>> I use SubDag is because it gives me an easy way to retry the
> > > > >>> SubmitJobTask
> > > > >>>> if something about the PollJobSensor fails.
> > > > >>>> This pattern would be really nice for jobs that are expected to
> > run
> > > a
> > > > >>> long
> > > > >>>> time (because we can use sensor can use reschedule mode freeing
> up
> > > > >> slots)
> > > > >>>> but might fail for a retryable reason.
> > > > >>>> However, using SubDag to meet this use case defeats the purpose
> > > > because
> > > > >>>> SubDag infamously
> > > > >>>> <
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > >>>>>
> > > > >>>> blocks a "controller" slot for the entire duration.
> > > > >>>> This may feel like a cyclic behavior but reality it is very
> common
> > > for
> > > > >> a
> > > > >>>> single operator to submit job / wait til done.
> > > > >>>> We could use this case refactor many operators (e.g. BQ,
> Dataproc,
> > > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask]
> > > with
> > > > >> an
> > > > >>>> optional reschedule mode if user knows that this job may take a
> > long
> > > > >>> time.
> > > > >>>>
> > > > >>>> I'd be happy to the development work on adding this specific
> retry
> > > > >>> behavior
> > > > >>>> to TaskGroup once the base concept is implemented if others in
> the
> > > > >>>> community would find this a useful feature.
> > > > >>>>
> > > > >>>> Cheers,
> > > > >>>> Jake
> > > > >>>>
> > > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > >>>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> All for it :) . I think we are getting closer to have regular
> > > > >> planning
> > > > >>>> and
> > > > >>>>> making some structured approach to 2.0 and starting task force
> > for
> > > it
> > > > >>>> soon,
> > > > >>>>> so I think this should be perfectly fine to discuss and even
> > start
> > > > >>>>> implementing what's beyond as soon as we make sure that we are
> > > > >>>> prioritizing
> > > > >>>>> 2.0 work.
> > > > >>>>>
> > > > >>>>> J,
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
> > > > >> wrote:
> > > > >>>>>
> > > > >>>>>> Hi Jarek,
> > > > >>>>>>
> > > > >>>>>> I agree we should not change the behaviour of the existing
> > > > >>>> SubDagOperator
> > > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion about
> > > > >>> TaskGroup
> > > > >>>>> as
> > > > >>>>>> a brand new concept/feature independent from the existing
> > > > >>>> SubDagOperator?
> > > > >>>>>> In other words, shall we add TaskGroup as a UI grouping
> concept
> > > > >> like
> > > > >>>> Ash
> > > > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we
> are
> > > > >>> ready
> > > > >>>>> with
> > > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > > > >>>>>>
> > > > >>>>>> I really like Ash's idea of simplifying the SubDagOperator
> idea
> > > > >> into
> > > > >>> a
> > > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> > "reattaching
> > > > >> all
> > > > >>>> the
> > > > >>>>>> tasks to the root DAG" is the way to go. And I see James
> pointed
> > > > >> out
> > > > >>> we
> > > > >>>>>> need some helper functions to simplify dependencies setting of
> > > > >>>> TaskGroup.
> > > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
> > > > >>>> TaskGroup
> > > > >>>>> as
> > > > >>>>>> a UI concept should be a relatively small change. We can
> > simplify
> > > > >>>>> Xinbin's
> > > > >>>>>> PR further. So I put up this alternative proposal here:
> > > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > > >>>>>>
> > > > >>>>>> I have not done any UI changes due to lack of experience with
> > web
> > > > >> UI.
> > > > >>>> If
> > > > >>>>>> anyone's interested, please take a look at the PR.
> > > > >>>>>>
> > > > >>>>>> Qian
> > > > >>>>>>
> > > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > >>> Jarek.Potiuk@polidea.com
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Similar point here to the other ideas that are popping up.
> > Maybe
> > > > >> we
> > > > >>>>>> should
> > > > >>>>>>> just focus on completing 2.0 and make all discussions about
> > > > >> further
> > > > >>>>>>> improvements to 2.1? While those are important discussions
> (and
> > > > >> we
> > > > >>>>> should
> > > > >>>>>>> continue them in the  near future !) I think at this point
> > > > >> focusing
> > > > >>>> on
> > > > >>>>>>> delivering 2.0 in its current shape should be our focus now ?
> > > > >>>>>>>
> > > > >>>>>>> J.
> > > > >>>>>>>
> > > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > >>> bin.huangxb@gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi Daniel
> > > > >>>>>>>>
> > > > >>>>>>>> I agree that the TaskGroup should have the same API as a DAG
> > > > >>> object
> > > > >>>>>>> related
> > > > >>>>>>>> to task dependencies, but it will not have anything related
> to
> > > > >>>> actual
> > > > >>>>>>>> execution or scheduling.
> > > > >>>>>>>> I will update the AIP according to this over the weekend.
> > > > >>>>>>>>
> > > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > > > >> import
> > > > >>>> the
> > > > >>>>>>> object
> > > > >>>>>>>> you can import it with parameters to determine the shape of
> > the
> > > > >>>> DAG.
> > > > >>>>>>>>
> > > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a
> similar
> > > > >>>> purpose
> > > > >>>>>> as
> > > > >>>>>>> a
> > > > >>>>>>>> DAG factory function?
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > >>>>>>> daniel.imberman@gmail.com
> > > > >>>>>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Bin,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object
> (e.g.
> > > > >>> the
> > > > >>>>>>> bitwise
> > > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > > >>>> “DAGTemplate”
> > > > >>>>>>>> object
> > > > >>>>>>>>> s.t. when you import the object you can import it with
> > > > >>> parameters
> > > > >>>>> to
> > > > >>>>>>>>> determine the shape of the DAG.
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > >>>>> bin.huangxb@gmail.com
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>> The TaskGroup will not take schedule interval as a
> parameter
> > > > >>>>> itself,
> > > > >>>>>>> and
> > > > >>>>>>>> it
> > > > >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
> > > > >>>>> TaskGroup
> > > > >>>>>>>> will
> > > > >>>>>>>>> only contain a group of tasks with interdependencies, and
> the
> > > > >>>>>> TaskGroup
> > > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > > >>> execution/scheduling
> > > > >>>>>> logic
> > > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
> > > > >>> like
> > > > >>>> a
> > > > >>>>>> DAG
> > > > >>>>>>>>> does.
> > > > >>>>>>>>>
> > > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > > >> interval
> > > > >>>> of
> > > > >>>>>> DAG
> > > > >>>>>>> is
> > > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I am curious why you ask this. Is this a use case that you
> > > > >> want
> > > > >>>> to
> > > > >>>>>>>> achieve?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Bin
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > >> thanosxnicholas@gmail.com
> > > > >>>>
> > > > >>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
> > > > >>> same
> > > > >>>>> as
> > > > >>>>>>> the
> > > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > > >> interval
> > > > >>> of
> > > > >>>>>>>> TaskGroup
> > > > >>>>>>>>>> could be different with that of the DAG? For example,
> there
> > > > >>> is
> > > > >>>>> the
> > > > >>>>>>>>> scenario
> > > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > > >> schedule
> > > > >>>>>> interval
> > > > >>>>>>>> of
> > > > >>>>>>>>>> TaskGroup is 20 min.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Cheers,
> > > > >>>>>>>>>> Nicholas
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > >>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi Nicholas,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> > > > >>> maybe
> > > > >>>>> it
> > > > >>>>>>> will
> > > > >>>>>>>>>> throw
> > > > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > > > >>>>>>>> schedule_interval
> > > > >>>>>>>>>> will
> > > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> > > > >>>> SubDag,
> > > > >>>>>>> there
> > > > >>>>>>>>>> will
> > > > >>>>>>>>>>> be no subdag schedule_interval.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Bin
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > >>>> thanosxnicholas@gmail.com
> > > > >>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> > > > >> the
> > > > >>>>>>> schedule
> > > > >>>>>>>>>>>> interval of SubDAG is different from that of the parent
> > > > >>>> DAG?
> > > > >>>>> I
> > > > >>>>>>> have
> > > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
> > > > >>> of
> > > > >>>>>>> SubDAG.
> > > > >>>>>>>> If
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> > > > >>> will
> > > > >>>>>>> happen
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>> Nicholas Jiang
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > >>>>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> > > > >>>>>> groups. I
> > > > >>>>>>>>> think
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > > >>> subdag
> > > > >>>>> and
> > > > >>>>>>>>>> introduce
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > > > >>> along
> > > > >>>>>> with
> > > > >>>>>>>>> their
> > > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> > > > >>>> DAG*.
> > > > >>>>>> The
> > > > >>>>>>>>> only
> > > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > > > >>> still
> > > > >>>>> need
> > > > >>>>>>> to
> > > > >>>>>>>>> add
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>> a DAG for execution.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Here is a small code snippet.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>> class TaskGroup:
> > > > >>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> If default_args is missing, it will take default args
> > > > >>>> from
> > > > >>>>>> the
> > > > >>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > >>>>>>>>>>>>> pass
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > > > >>> tasks
> > > > >>>>> to
> > > > >>>>>> a
> > > > >>>>>>>> DAG
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> This can be declared in a separate file from the dag
> > > > >>> file
> > > > >>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > >>>>>>>>>>>> default_args=default_args)
> > > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > > >>>>>>>>>>>>> task2.dag = download_group
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> with download_group:
> > > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> [task, task2] >> task3
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > >>>>>>> default_args=default_args,
> > > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > >>>>>>>>>>>>> start >> download_group
> > > > >>>>>>>>>>>>> # this is equivalent to
> > > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > >>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > > > >> set
> > > > >>>>>>>> dependencies
> > > > >>>>>>>>>>>> between
> > > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > >>>>>> SubDagOperator,
> > > > >>>>>>>> and
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>> can
> > > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > >> Airflow
> > > > >>>> 2.0
> > > > >>>>>> and
> > > > >>>>>>>>> allow
> > > > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > > > >> want
> > > > >>>> to
> > > > >>>>>> keep
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Any thoughts?
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > > >>> groups
> > > > >>>>> and
> > > > >>>>>> a
> > > > >>>>>>>>>>>> zoom-in/out
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> > > > >>>>> object
> > > > >>>>>>>> since
> > > > >>>>>>>>> it
> > > > >>>>>>>>>>> is
> > > > >>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > > >>> create
> > > > >>>>>>>> underlying
> > > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > > >> group
> > > > >>>> of
> > > > >>>>>>> tasks.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Max
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thank you for your email.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > > >>>>>> rewrites
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > >> it
> > > > >>>>> will
> > > > >>>>>>>> give a
> > > > >>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > >> does
> > > > >>>>> this I
> > > > >>>>>>>>> think.
> > > > >>>>>>>>>> At
> > > > >>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > >>> representation,
> > > > >>>>> but
> > > > >>>>>> at
> > > > >>>>>>>>> least
> > > > >>>>>>>>>>> it
> > > > >>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > > >> In
> > > > >>> my
> > > > >>>>>>>> proposal
> > > > >>>>>>>>> as
> > > > >>>>>>>>>>>> also
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > >> from
> > > > >>>> the
> > > > >>>>>>> subdag
> > > > >>>>>>>>> and
> > > > >>>>>>>>>>> add
> > > > >>>>>>>>>>>>>> them
> > > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > > > >>>> will
> > > > >>>>>> look
> > > > >>>>>>>>>> exactly
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > >> attached
> > > > >>>> to
> > > > >>>>>>> those
> > > > >>>>>>>>>>>> sections.
> > > > >>>>>>>>>>>>>>> These
> > > > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > > > >>> UI.
> > > > >>>>> So
> > > > >>>>>>>> after
> > > > >>>>>>>>>>>> parsing
> > > > >>>>>>>>>>>>> (
> > > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > > >> the
> > > > >>>>>>> *root_dag
> > > > >>>>>>>>>>>> *instead
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> *root_dag +
> > > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > >>>>>>>>>> current_group=section-1,
> > > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > >>> naming
> > > > >>>>>>>>>>> suggestions),
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > >>> nested
> > > > >>>>>> group
> > > > >>>>>>>> and
> > > > >>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > > > >>>> like
> > > > >>>>>> this
> > > > >>>>>>>> by
> > > > >>>>>>>>>>>>> utilizing
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > > >> in
> > > > >>>> some
> > > > >>>>>>> way.
> > > > >>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > >>> complexity
> > > > >>>> of
> > > > >>>>>>>> SubDag
> > > > >>>>>>>>>> for
> > > > >>>>>>>>>>>>>>> execution
> > > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > >> using
> > > > >>>>>> SubDag.
> > > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > > >>>>> reusable
> > > > >>>>>>> dag
> > > > >>>>>>>>> code
> > > > >>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > > > >>> new
> > > > >>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>> (see
> > > > >>>>>>>>>>>>>>> AIP
> > > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > > >>>>> function
> > > > >>>>>>> for
> > > > >>>>>>>>>>>>> generating 1
> > > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > > >>> (in
> > > > >>>>> this
> > > > >>>>>>>> case,
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > > >>> root
> > > > >>>>>> dag).
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > > >>>> with a
> > > > >>>>>>>>>> simpler
> > > > >>>>>>>>>>>>>> concept
> > > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > >> out
> > > > >>>> the
> > > > >>>>>>>>>> contents
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>> SubDag
> > > > >>>>>>>>>>>>>>>> and becomes more like
> > > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > >>>>>>>>>>>>>>> (forgive
> > > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > > > >>>> still
> > > > >>>>>>>>>>> necessary
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> keep the
> > > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > > >>>> name?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > > >>>> Chris
> > > > >>>>>>> Palmer
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>>>> helping
> > > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > > > >>>> will
> > > > >>>>>> just
> > > > >>>>>>>>> paste
> > > > >>>>>>>>>>> it
> > > > >>>>>>>>>>>>>> here.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > >> in
> > > > >>>> the
> > > > >>>>>> same
> > > > >>>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > > >> a
> > > > >>>>>>> TaskGroup
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>>>> either a
> > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > >> in
> > > > >>>> any
> > > > >>>>>>> group
> > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>> TaskGroup
> > > > >>>>> and
> > > > >>>>>>>>>> either
> > > > >>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >> as
> > > > >>> a
> > > > >>>>>> single
> > > > >>>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>>> "status"
> > > > >>>>>>> of a
> > > > >>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > >>> executor), I
> > > > >>>>>> think
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > > >> to
> > > > >>>>>>> implement
> > > > >>>>>>>>>> some
> > > > >>>>>>>>>>>>>> metadata
> > > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > >>> tasks
> > > > >>>>>> etc.)
> > > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > > > >>> up
> > > > >>>>> the
> > > > >>>>>>>>>> individual
> > > > >>>>>>>>>>>>>> tasks'
> > > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > >> status
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > >> Imberman
> > > > >>> <
> > > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > > > >>> to
> > > > >>>>> tie
> > > > >>>>>>> dags
> > > > >>>>>>>>>>>> together
> > > > >>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > > > >>>> could
> > > > >>>>>>>>>> essentially
> > > > >>>>>>>>>>>>> write
> > > > >>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > >>>> starter-tasks
> > > > >>>>>> for
> > > > >>>>>>>>> that
> > > > >>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > > >> UI
> > > > >>>>>> concept.
> > > > >>>>>>>> It
> > > > >>>>>>>>>>>> doesn’t
> > > > >>>>>>>>>>>>>> need
> > > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > > > >>>> tasks
> > > > >>>>>> to
> > > > >>>>>>>> the
> > > > >>>>>>>>>>> queue
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > > >> available.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > >>>>>>>>>>>>>>>>> ]
> > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > > >> <
> > > > >>>>>>>>>>> chris@crpalmer.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > >>>>>> abstraction.
> > > > >>>>>>> I
> > > > >>>>>>>>>> think
> > > > >>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > >> high
> > > > >>>>> level
> > > > >>>>>> I
> > > > >>>>>>>>> think
> > > > >>>>>>>>>>> you
> > > > >>>>>>>>>>>>> want
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>> functionality:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > > > >>> the
> > > > >>>>>> same
> > > > >>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > > > >>>>>> TaskGroup
> > > > >>>>>>>> and
> > > > >>>>>>>>>>>> either
> > > > >>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > > > >>> any
> > > > >>>>>> group
> > > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>> TaskGroup
> > > > >>>>> and
> > > > >>>>>>>> either
> > > > >>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >> as a
> > > > >>>>>> single
> > > > >>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>> "status"
> > > > >>>>>> of
> > > > >>>>>>> a
> > > > >>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > >>> object
> > > > >>>>>> with
> > > > >>>>>>>> its
> > > > >>>>>>>>>> own
> > > > >>>>>>>>>>>>>> database
> > > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > > >>>> tasks.
> > > > >>>>> I
> > > > >>>>>>>> think
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>> build
> > > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > >> point
> > > > >>> of
> > > > >>>>>> view
> > > > >>>>>>> a
> > > > >>>>>>>>> DAG
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > >> differently.
> > > > >>> So
> > > > >>>>> it
> > > > >>>>>>>> really
> > > > >>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>> becomes
> > > > >>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > > > >>> of
> > > > >>>>>> Tasks,
> > > > >>>>>>>> and
> > > > >>>>>>>>>>>> allows
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Chris
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > >> the
> > > > >>>> more
> > > > >>>>>>>>> important
> > > > >>>>>>>>>>>> issue
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> fix),
> > > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > >>> right
> > > > >>>>> way
> > > > >>>>>>>>> forward
> > > > >>>>>>>>>>>> (just
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> might
> > > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > >>> adding
> > > > >>>>>>> visual
> > > > >>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> UI).
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > >>> with
> > > > >>>>> more
> > > > >>>>>>>>> context
> > > > >>>>>>>>>>> on
> > > > >>>>>>>>>>>>> why
> > > > >>>>>>>>>>>>>>>>> subdags
> > > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > >>>>>>>>>>>>>> . A
> > > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > >> is
> > > > >>>> e.g.
> > > > >>>>>>>>> enabling
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> operator
> > > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > >>>> well. I
> > > > >>>>>> see
> > > > >>>>>>>>> this
> > > > >>>>>>>>>>>> being
> > > > >>>>>>>>>>>>>>>>> separate
> > > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > >> UI
> > > > >>>> but
> > > > >>>>>> one
> > > > >>>>>>> of
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>> two
> > > > >>>>>>>>>>>>>> items
> > > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > >>>>>> functionality.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > >> and
> > > > >>>>> they
> > > > >>>>>>> are
> > > > >>>>>>>>>>> always a
> > > > >>>>>>>>>>>>>> giant
> > > > >>>>>>>>>>>>>>>>> pain
> > > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > >>>>> confusion
> > > > >>>>>>> and
> > > > >>>>>>>>>>>> breakages
> > > > >>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > >> Coder <
> > > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > >> UI
> > > > >>>>>>> concept. I
> > > > >>>>>>>>> use
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > >>> you
> > > > >>>>>> have a
> > > > >>>>>>>>> group
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > >> tasks
> > > > >>>>>> start,
> > > > >>>>>>>>> using
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > > >>>> and I
> > > > >>>>>>> think
> > > > >>>>>>>>>> also
> > > > >>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>> easier
> > > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > >> Hamlin
> > > > >>> <
> > > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > >>>>>> Berlin-Taylor
> > > > >>>>>>> <
> > > > >>>>>>>>>>>>>> ash@apache.org
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > >>>> anymore?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > >>>>> replacing
> > > > >>>>>> it
> > > > >>>>>>>>> with
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > >> to
> > > > >>>> get
> > > > >>>>>>>> wrong,
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>>> closer
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > >>>> subdags?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > >>>> subdags
> > > > >>>>>>> could
> > > > >>>>>>>>>> start
> > > > >>>>>>>>>>>>>> running
> > > > >>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > >> we
> > > > >>>> not
> > > > >>>>>>> also
> > > > >>>>>>>>> just
> > > > >>>>>>>>>>>>>>> _enitrely_
> > > > >>>>>>>>>>>>>>>>>>> remove
> > > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > >> it
> > > > >>>> with
> > > > >>>>>>>>> something
> > > > >>>>>>>>>>>>>> simpler.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > >>> haven't
> > > > >>>>> used
> > > > >>>>>>>> them
> > > > >>>>>>>>>>>>>> extensively
> > > > >>>>>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>>>>>> may
> > > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > >>>> has(?)
> > > > >>>>> to
> > > > >>>>>>> be
> > > > >>>>>>>> of
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>> form
> > > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > >> schedule_interval,
> > > > >>>> but
> > > > >>>>>> it
> > > > >>>>>>>> has
> > > > >>>>>>>>> to
> > > > >>>>>>>>>>>> match
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> parent
> > > > >>>>>>>>>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > >>>> (Does
> > > > >>>>>> it
> > > > >>>>>>>> make
> > > > >>>>>>>>>>> sense
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>>> this?
> > > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > >>> sub
> > > > >>>>> dag
> > > > >>>>>>>> would
> > > > >>>>>>>>>>> never
> > > > >>>>>>>>>>>>>>>>> execute, so
> > > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > >>>>> operator a
> > > > >>>>>>>>> subdag
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>> always
> > > > >>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > >>>>>> Berlin-Taylor <
> > > > >>>>>>>>>>>>>> ash@apache.org>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > >>>>> excited
> > > > >>>>>> to
> > > > >>>>>>>> see
> > > > >>>>>>>>>> how
> > > > >>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>> progresses.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>> parsing*:
> > > > >>>>> This
> > > > >>>>>>>>>> rewrites
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>> parsing,
> > > > >>>>> and
> > > > >>>>>> it
> > > > >>>>>>>>> will
> > > > >>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > >>>> already
> > > > >>>>>> does
> > > > >>>>>>>>> this
> > > > >>>>>>>>>> I
> > > > >>>>>>>>>>>>> think.
> > > > >>>>>>>>>>>>>>> At
> > > > >>>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > >>>> correctly.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > >>>> Huang <
> > > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > >>>> collect
> > > > >>>>>>>>> feedback
> > > > >>>>>>>>>> on
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> AIP-34
> > > > >>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > >>>>>> previously
> > > > >>>>>>>>>> briefly
> > > > >>>>>>>>>>>>>>>>> mentioned in
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > >>> done
> > > > >>>>> for
> > > > >>>>>>>>> Airflow
> > > > >>>>>>>>>>> 2.0,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> one of
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > >>> attach
> > > > >>>>>> tasks
> > > > >>>>>>>> back
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> root
> > > > >>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > >>>>>> SubDagOperator
> > > > >>>>>>>>>> related
> > > > >>>>>>>>>>>>>> issues
> > > > >>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > >> while
> > > > >>>>>>> respecting
> > > > >>>>>>>>>>>>>> dependencies
> > > > >>>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > >> effect
> > > > >>>> on
> > > > >>>>>> the
> > > > >>>>>>> UI
> > > > >>>>>>>>>> will
> > > > >>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>> achieved
> > > > >>>>>>>>>>>>>>>>>>>> through
> > > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > >>>> function
> > > > >>>>>> more
> > > > >>>>>>>>>>> reusable
> > > > >>>>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>> don't
> > > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > >>>>>>> child_dag_name
> > > > >>>>>>>>> in
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>>>>>>>> signature
> > > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>> parsing*:
> > > > >>>>> This
> > > > >>>>>>>>>> rewrites
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>> parsing,
> > > > >>>>> and
> > > > >>>>>> it
> > > > >>>>>>>>> will
> > > > >>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > >> new
> > > > >>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>> acts
> > > > >>>>>>>>>>>>>>> like a
> > > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > >>>>> methods
> > > > >>>>>>> are
> > > > >>>>>>>>>>> removed.
> > > > >>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > >> *with
> > > > >>>>>>>>> *subdag_args
> > > > >>>>>>>>>>> *and
> > > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > >> PythonOperator
> > > > >>>>>>>> signature.
> > > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > >>>>>>> current_group
> > > > >>>>>>>> &
> > > > >>>>>>>>>>>>>> parent_group
> > > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > >>> used
> > > > >>>>> to
> > > > >>>>>>>> group
> > > > >>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > >>>>> further
> > > > >>>>>>> to
> > > > >>>>>>>>>> group
> > > > >>>>>>>>>>>>>>> arbitrary
> > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > >>> allow
> > > > >>>>>>>>> group-level
> > > > >>>>>>>>>>>>>> operations
> > > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > >>> the
> > > > >>>>>> dag)
> > > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > >> Proposed
> > > > >>>> UI
> > > > >>>>>>>>>> modification
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > >>>> flat
> > > > >>>>>>>>> structure
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>> pair
> > > > >>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> first
> > > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > >>>>> hierarchical
> > > > >>>>>>>>>>> structure.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > >> PRs
> > > > >>>> for
> > > > >>>>>>>> details:
> > > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > >>>>> aspects
> > > > >>>>>>>> that
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > >>> the
> > > > >>>>>> third
> > > > >>>>>>>>>> change
> > > > >>>>>>>>>>>>>>> regarding
> > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > >>>> looking
> > > > >>>>>>>> forward
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>> it!
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>> Thanks & Regards
> > > > >>>>>>>>>>>>>>> Poornima
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>>
> > > > >>>>>>> Jarek Potiuk
> > > > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > > >>>>>>>
> > > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > >>>>> <+48%20660%20796%20129>>
> > > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>>>
> > > > >>>>> Jarek Potiuk
> > > > >>>>> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > >>>>>
> > > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > >>>>> <+48%20660%20796%20129>>
> > > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> --
> > > > >>>>
> > > > >>>> *Jacob Ferriero*
> > > > >>>>
> > > > >>>> Strategic Cloud Engineer: Data Engineering
> > > > >>>>
> > > > >>>> jferriero@google.com
> > > > >>>>
> > > > >>>> 617-714-2509
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!

On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com> wrote:

> What's your ID i.e. if you haven't created an account yet, please create
> one at https://cwiki.apache.org/confluence/signup.action and send us your
> ID and we will add permissions.
>
> Thanks. I'll edit the AIP. May I request permission to edit it?
> > My wiki user email is yuqian1990@gmail.com.
>
>
> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com> wrote:
>
> > Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit
> it?
> > My wiki user email is yuqian1990@gmail.com.
> >
> > Re Gerard: yes the UI loads all the nodes as json from the web server at
> > once. However, it only adds the top level nodes and edges to the graph
> when
> > the Graph View page is first opened. And then adds the expanded nodes to
> > the graph as the user expands them. From what I've experienced with DAGs
> > containing around 400 tasks (not using TaskGroup or SubDagOperator),
> > opening the whole dag in Graph View usually takes 5 seconds. Less than
> 60ms
> > of that is taken by loading the data from webserver. The remaining 4.9s+
> is
> > taken by javascript functions in dagre-d3.min.js such as createNodes,
> > createEdgeLabels, etc and by rendering the graph. With TaskGroup being
> used
> > to group tasks into a smaller number of top-level nodes, the amount of
> data
> > loaded from webserver will remain about the same compared to a flat dag
> of
> > the same size, but the number of nodes and edges needed to be plot on the
> > graph can be reduced significantly. So in theory this should speed up the
> > time it takes to open Graph View even without lazy-loading the data (I'll
> > experiment to find out). That said, if it comes to a point lazy-loading
> > helps, we can still implement it as an improvement.
> >
> > Re James: the Tree View looks as if all all the groups are fully
> expanded.
> > (because under the hood all the tasks are in a single DAG). I'm less
> > worried about Tree View at the moment because it already has a mechanism
> > for collapsing tasks by the dependency tree. That said, the Tree View can
> > definitely be improved too with TaskGroup. (e.g. collapse tasks in the
> same
> > TaskGroup when Tree View is first opened).
> >
> > For both suggestions, implementing them don't require fundamental changes
> > to the idea. I think we can have a basic working TaskGroup first, and
> then
> > improve it incrementally in several PRs as we get more feedback from the
> > community. What do you think?
> >
> > Qian
> >
> >
> > On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com> wrote:
> >
> > > I agree this looks great, one question, how does the tree view look?
> > >
> > > James Coder
> > >
> > > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> gcasassaez@twitter.com
> > .invalid>
> > > wrote:
> > > >
> > > > First of all, this is awesome!!
> > > >
> > > > Secondly, checking your UI code, seems you are loading all operators
> at
> > > > once. Wondering if we can load them as needed (aka load whenever we
> > click
> > > > the TaskGroup). Some of our DAGs are so large that take forever to
> load
> > > on
> > > > the Graph view, so worried about this still being an issue here. It
> may
> > > be
> > > > easily solvable by implementing lazy loading of the graph. Not sure
> how
> > > > easy to implement/add to the UI extension (and dont want to push for
> > > early
> > > > optimization as its the root of all evil).
> > > > Gerard Casas Saez
> > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >
> > > >
> > > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> bin.huangxb@gmail.com>
> > > wrote:
> > > >>
> > > >> Hi Yu,
> > > >>
> > > >> Thank you so much for taking on this. I was fairly distracted
> > previously
> > > >> and I didn't have the time to update the proposal. In fact, after
> > > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP has
> > > been
> > > >> changed to favor the concept of TaskGroup instead of rewriting
> > > >> SubDagOperator (though it may may sense to deprecate SubDag in a
> > future
> > > >> date.).
> > > >>
> > > >> Your PR is amazing and it has implemented the desire features. I
> think
> > > we
> > > >> can focus on your new PR instead. Do you mind updating the AIP based
> > on
> > > >> what you have done in your PR?
> > > >>
> > > >> Best,
> > > >> Bin
> > > >>
> > > >>
> > > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com>
> > wrote:
> > > >>>
> > > >>> Hi, all, I've added the basic UI changes to my proposed
> > implementation
> > > of
> > > >>> TaskGroup as UI grouping concept:
> > > >>> https://github.com/apache/airflow/pull/10153
> > > >>>
> > > >>> I think Chris had a pretty good specification of TaskGroup so i'm
> > > quoting
> > > >>> it here. The only thing I don't fully agree with is the restriction
> > > >>> "... **cannot*
> > > >>> have dependencies between a Task in a TaskGroup and either a*
> > > >>> *   Task in a different TaskGroup or a Task not in any group*". I
> > think
> > > >>> this is over restrictive. Since TaskGroup is a UI concept, tasks
> can
> > > have
> > > >>> dependencies on tasks in other TaskGroup or not in any TaskGroup.
> In
> > my
> > > >> PR,
> > > >>> this is allowed. The graph edges will update accordingly when
> > > TaskGroups
> > > >>> are expanded/collapsed. TaskGroup is only helping to make the UI
> look
> > > >> less
> > > >>> crowded. Under the hood, everything is still a DAG of tasks and
> edges
> > > so
> > > >>> things work normally. Here's a screenshot
> > > >>> <
> > > >>>
> > > >>
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > >>>>
> > > >>> of the UI interaction.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > dependencies
> > > >>> between Tasks in the same TaskGroup, but   *cannot* have
> dependencies
> > > >>> between a Task in a TaskGroup and either a   Task in a different
> > > >> TaskGroup
> > > >>> or a Task not in any group   - You *can* have dependencies between
> a
> > > >>> TaskGroup and either other   TaskGroups or Tasks not in any group
>  -
> > > The
> > > >>> UI will by default render a TaskGroup as a single "object", but
> >  which
> > > >> you
> > > >>> expand or zoom into in some way   - You'd need some way to
> determine
> > > what
> > > >>> the "status" of a TaskGroup was   at least for UI display purposes*
> > > >>>
> > > >>>
> > > >>> Regarding Jake's comment, I agree it's possible to implement the
> > > >> "retrying
> > > >>> tasks in a group" pattern he mentioned as an optional feature of
> > > >> TaskGroup
> > > >>> although that may go against having TaskGroup as a pure UI concept.
> > For
> > > >> the
> > > >>> motivating example Jake provided, I suggest implementing both
> > > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> > operator.
> > > It
> > > >>> can do something like BaseSensorOperator.execute() does in
> > "reschedule"
> > > >>> mode, i.e. it first executes some code to submit the long running
> job
> > > to
> > > >>> the external service, and store the state (e.g. in XCom). Then
> > > reschedule
> > > >>> itself. Subsequent runs then pokes for the completion state.
> > > >>>
> > > >>>
> > > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > >> <jferriero@google.com.invalid
> > > >>>>
> > > >>> wrote:
> > > >>>
> > > >>>> I really like this idea of a TaskGroup container as I think this
> > will
> > > >> be
> > > >>>> much easier to use than SubDag.
> > > >>>>
> > > >>>> I'd like to propose an optional behavior for special retry
> mechanics
> > > >> via
> > > >>> a
> > > >>>> TaskGroup.retry_all property.
> > > >>>> This way I could use TaskGroup to replace my favorite use of
> SubDag
> > > for
> > > >>>> atomically retrying tasks of the pattern "act on external state
> then
> > > >>>> reschedule poll until desired state reached".
> > > >>>>
> > > >>>> Motivating use case I have for a SubDag is very simple two task
> > group
> > > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > >>>> I use SubDag is because it gives me an easy way to retry the
> > > >>> SubmitJobTask
> > > >>>> if something about the PollJobSensor fails.
> > > >>>> This pattern would be really nice for jobs that are expected to
> run
> > a
> > > >>> long
> > > >>>> time (because we can use sensor can use reschedule mode freeing up
> > > >> slots)
> > > >>>> but might fail for a retryable reason.
> > > >>>> However, using SubDag to meet this use case defeats the purpose
> > > because
> > > >>>> SubDag infamously
> > > >>>> <
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > >>>>>
> > > >>>> blocks a "controller" slot for the entire duration.
> > > >>>> This may feel like a cyclic behavior but reality it is very common
> > for
> > > >> a
> > > >>>> single operator to submit job / wait til done.
> > > >>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
> > > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask]
> > with
> > > >> an
> > > >>>> optional reschedule mode if user knows that this job may take a
> long
> > > >>> time.
> > > >>>>
> > > >>>> I'd be happy to the development work on adding this specific retry
> > > >>> behavior
> > > >>>> to TaskGroup once the base concept is implemented if others in the
> > > >>>> community would find this a useful feature.
> > > >>>>
> > > >>>> Cheers,
> > > >>>> Jake
> > > >>>>
> > > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > >>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> All for it :) . I think we are getting closer to have regular
> > > >> planning
> > > >>>> and
> > > >>>>> making some structured approach to 2.0 and starting task force
> for
> > it
> > > >>>> soon,
> > > >>>>> so I think this should be perfectly fine to discuss and even
> start
> > > >>>>> implementing what's beyond as soon as we make sure that we are
> > > >>>> prioritizing
> > > >>>>> 2.0 work.
> > > >>>>>
> > > >>>>> J,
> > > >>>>>
> > > >>>>>
> > > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> Hi Jarek,
> > > >>>>>>
> > > >>>>>> I agree we should not change the behaviour of the existing
> > > >>>> SubDagOperator
> > > >>>>>> till Airflow 2.1. Is it okay to continue the discussion about
> > > >>> TaskGroup
> > > >>>>> as
> > > >>>>>> a brand new concept/feature independent from the existing
> > > >>>> SubDagOperator?
> > > >>>>>> In other words, shall we add TaskGroup as a UI grouping concept
> > > >> like
> > > >>>> Ash
> > > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
> > > >>> ready
> > > >>>>> with
> > > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > > >>>>>>
> > > >>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
> > > >> into
> > > >>> a
> > > >>>>>> simple UI grouping concept. I think Xinbin's idea of
> "reattaching
> > > >> all
> > > >>>> the
> > > >>>>>> tasks to the root DAG" is the way to go. And I see James pointed
> > > >> out
> > > >>> we
> > > >>>>>> need some helper functions to simplify dependencies setting of
> > > >>>> TaskGroup.
> > > >>>>>> Xinbin put up a pretty elegant example in his PR
> > > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
> > > >>>> TaskGroup
> > > >>>>> as
> > > >>>>>> a UI concept should be a relatively small change. We can
> simplify
> > > >>>>> Xinbin's
> > > >>>>>> PR further. So I put up this alternative proposal here:
> > > >>>>>> https://github.com/apache/airflow/pull/10153
> > > >>>>>>
> > > >>>>>> I have not done any UI changes due to lack of experience with
> web
> > > >> UI.
> > > >>>> If
> > > >>>>>> anyone's interested, please take a look at the PR.
> > > >>>>>>
> > > >>>>>> Qian
> > > >>>>>>
> > > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > >>> Jarek.Potiuk@polidea.com
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Similar point here to the other ideas that are popping up.
> Maybe
> > > >> we
> > > >>>>>> should
> > > >>>>>>> just focus on completing 2.0 and make all discussions about
> > > >> further
> > > >>>>>>> improvements to 2.1? While those are important discussions (and
> > > >> we
> > > >>>>> should
> > > >>>>>>> continue them in the  near future !) I think at this point
> > > >> focusing
> > > >>>> on
> > > >>>>>>> delivering 2.0 in its current shape should be our focus now ?
> > > >>>>>>>
> > > >>>>>>> J.
> > > >>>>>>>
> > > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > >>> bin.huangxb@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi Daniel
> > > >>>>>>>>
> > > >>>>>>>> I agree that the TaskGroup should have the same API as a DAG
> > > >>> object
> > > >>>>>>> related
> > > >>>>>>>> to task dependencies, but it will not have anything related to
> > > >>>> actual
> > > >>>>>>>> execution or scheduling.
> > > >>>>>>>> I will update the AIP according to this over the weekend.
> > > >>>>>>>>
> > > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > > >> import
> > > >>>> the
> > > >>>>>>> object
> > > >>>>>>>> you can import it with parameters to determine the shape of
> the
> > > >>>> DAG.
> > > >>>>>>>>
> > > >>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
> > > >>>> purpose
> > > >>>>>> as
> > > >>>>>>> a
> > > >>>>>>>> DAG factory function?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > >>>>>>> daniel.imberman@gmail.com
> > > >>>>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Bin,
> > > >>>>>>>>>
> > > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
> > > >>> the
> > > >>>>>>> bitwise
> > > >>>>>>>>> operator fro task dependencies). We could even make a
> > > >>>> “DAGTemplate”
> > > >>>>>>>> object
> > > >>>>>>>>> s.t. when you import the object you can import it with
> > > >>> parameters
> > > >>>>> to
> > > >>>>>>>>> determine the shape of the DAG.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > >>>>> bin.huangxb@gmail.com
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>> The TaskGroup will not take schedule interval as a parameter
> > > >>>>> itself,
> > > >>>>>>> and
> > > >>>>>>>> it
> > > >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
> > > >>>>> TaskGroup
> > > >>>>>>>> will
> > > >>>>>>>>> only contain a group of tasks with interdependencies, and the
> > > >>>>>> TaskGroup
> > > >>>>>>>>> behaves like a task. It doesn't contain any
> > > >>> execution/scheduling
> > > >>>>>> logic
> > > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
> > > >>> like
> > > >>>> a
> > > >>>>>> DAG
> > > >>>>>>>>> does.
> > > >>>>>>>>>
> > > >>>>>>>>>> For example, there is the scenario that the schedule
> > > >> interval
> > > >>>> of
> > > >>>>>> DAG
> > > >>>>>>> is
> > > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> > > >>>>>>>>>
> > > >>>>>>>>> I am curious why you ask this. Is this a use case that you
> > > >> want
> > > >>>> to
> > > >>>>>>>> achieve?
> > > >>>>>>>>>
> > > >>>>>>>>> Bin
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > >> thanosxnicholas@gmail.com
> > > >>>>
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi Bin,
> > > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
> > > >>> same
> > > >>>>> as
> > > >>>>>>> the
> > > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > >> interval
> > > >>> of
> > > >>>>>>>> TaskGroup
> > > >>>>>>>>>> could be different with that of the DAG? For example, there
> > > >>> is
> > > >>>>> the
> > > >>>>>>>>> scenario
> > > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > >> schedule
> > > >>>>>> interval
> > > >>>>>>>> of
> > > >>>>>>>>>> TaskGroup is 20 min.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Cheers,
> > > >>>>>>>>>> Nicholas
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > >>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Nicholas,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> > > >>> maybe
> > > >>>>> it
> > > >>>>>>> will
> > > >>>>>>>>>> throw
> > > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > > >>>>>>>> schedule_interval
> > > >>>>>>>>>> will
> > > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> > > >>>> SubDag,
> > > >>>>>>> there
> > > >>>>>>>>>> will
> > > >>>>>>>>>>> be no subdag schedule_interval.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Bin
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > >>>> thanosxnicholas@gmail.com
> > > >>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> > > >> the
> > > >>>>>>> schedule
> > > >>>>>>>>>>>> interval of SubDAG is different from that of the parent
> > > >>>> DAG?
> > > >>>>> I
> > > >>>>>>> have
> > > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
> > > >>> of
> > > >>>>>>> SubDAG.
> > > >>>>>>>> If
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> > > >>> will
> > > >>>>>>> happen
> > > >>>>>>>>> for
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>> Nicholas Jiang
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > >>>>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> > > >>>>>> groups. I
> > > >>>>>>>>> think
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > > >>> subdag
> > > >>>>> and
> > > >>>>>>>>>> introduce
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > > >>> along
> > > >>>>>> with
> > > >>>>>>>>> their
> > > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> > > >>>> DAG*.
> > > >>>>>> The
> > > >>>>>>>>> only
> > > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > > >>> still
> > > >>>>> need
> > > >>>>>>> to
> > > >>>>>>>>> add
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>> a DAG for execution.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Here is a small code snippet.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>> class TaskGroup:
> > > >>>>>>>>>>>>> """
> > > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> If default_args is missing, it will take default args
> > > >>>> from
> > > >>>>>> the
> > > >>>>>>>>>> DAG.
> > > >>>>>>>>>>>>> """
> > > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > >>>>>>>>>>>>> pass
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> """
> > > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > > >>> tasks
> > > >>>>> to
> > > >>>>>> a
> > > >>>>>>>> DAG
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> This can be declared in a separate file from the dag
> > > >>> file
> > > >>>>>>>>>>>>> """
> > > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > >>>>>>>>>>>> default_args=default_args)
> > > >>>>>>>>>>>>> download_group.add_task(task1)
> > > >>>>>>>>>>>>> task2.dag = download_group
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> with download_group:
> > > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [task, task2] >> task3
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > >>>>>>> default_args=default_args,
> > > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > >>>>>>>>>>>>> start >> download_group
> > > >>>>>>>>>>>>> # this is equivalent to
> > > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > >>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > > >> set
> > > >>>>>>>> dependencies
> > > >>>>>>>>>>>> between
> > > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > >>>>>> SubDagOperator,
> > > >>>>>>>> and
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > > >> Airflow
> > > >>>> 2.0
> > > >>>>>> and
> > > >>>>>>>>> allow
> > > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > > >> want
> > > >>>> to
> > > >>>>>> keep
> > > >>>>>>>> the
> > > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Any thoughts?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> +1, proposal looks good.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> The original intention was really to have tasks
> > > >>> groups
> > > >>>>> and
> > > >>>>>> a
> > > >>>>>>>>>>>> zoom-in/out
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> > > >>>>> object
> > > >>>>>>>> since
> > > >>>>>>>>> it
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > >>> create
> > > >>>>>>>> underlying
> > > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > >> group
> > > >>>> of
> > > >>>>>>> tasks.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Max
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thank you for your email.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > >>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > >>>>>> rewrites
> > > >>>>>>>> the
> > > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > >> it
> > > >>>>> will
> > > >>>>>>>> give a
> > > >>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > >> does
> > > >>>>> this I
> > > >>>>>>>>> think.
> > > >>>>>>>>>> At
> > > >>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > >>> representation,
> > > >>>>> but
> > > >>>>>> at
> > > >>>>>>>>> least
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > >> In
> > > >>> my
> > > >>>>>>>> proposal
> > > >>>>>>>>> as
> > > >>>>>>>>>>>> also
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > >> from
> > > >>>> the
> > > >>>>>>> subdag
> > > >>>>>>>>> and
> > > >>>>>>>>>>> add
> > > >>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > > >>>> will
> > > >>>>>> look
> > > >>>>>>>>>> exactly
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > >> attached
> > > >>>> to
> > > >>>>>>> those
> > > >>>>>>>>>>>> sections.
> > > >>>>>>>>>>>>>>> These
> > > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > > >>> UI.
> > > >>>>> So
> > > >>>>>>>> after
> > > >>>>>>>>>>>> parsing
> > > >>>>>>>>>>>>> (
> > > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > >> the
> > > >>>>>>> *root_dag
> > > >>>>>>>>>>>> *instead
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> *root_dag +
> > > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > >>>>>>>>>> current_group=section-1,
> > > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > >>> naming
> > > >>>>>>>>>>> suggestions),
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > >>> nested
> > > >>>>>> group
> > > >>>>>>>> and
> > > >>>>>>>>>>>> still
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> able to capture the dependency.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Runtime DAG:
> > > >>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > > >>>> like
> > > >>>>>> this
> > > >>>>>>>> by
> > > >>>>>>>>>>>>> utilizing
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > >> in
> > > >>>> some
> > > >>>>>>> way.
> > > >>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > >>> complexity
> > > >>>> of
> > > >>>>>>>> SubDag
> > > >>>>>>>>>> for
> > > >>>>>>>>>>>>>>> execution
> > > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > >> using
> > > >>>>>> SubDag.
> > > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > >>>>> reusable
> > > >>>>>>> dag
> > > >>>>>>>>> code
> > > >>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > > >>> new
> > > >>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>> (see
> > > >>>>>>>>>>>>>>> AIP
> > > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > >>>>> function
> > > >>>>>>> for
> > > >>>>>>>>>>>>> generating 1
> > > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > >>> (in
> > > >>>>> this
> > > >>>>>>>> case,
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > >>> root
> > > >>>>>> dag).
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > >>>> with a
> > > >>>>>>>>>> simpler
> > > >>>>>>>>>>>>>> concept
> > > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > >> out
> > > >>>> the
> > > >>>>>>>>>> contents
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>> SubDag
> > > >>>>>>>>>>>>>>>> and becomes more like
> > > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > >>>>>>>>>>>>>>> (forgive
> > > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > > >>>> still
> > > >>>>>>>>>>> necessary
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> keep the
> > > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > >>>> name?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > >>>> Chris
> > > >>>>>>> Palmer
> > > >>>>>>>>> for
> > > >>>>>>>>>>>>> helping
> > > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > > >>>> will
> > > >>>>>> just
> > > >>>>>>>>> paste
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>>> here.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > >> in
> > > >>>> the
> > > >>>>>> same
> > > >>>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > >> a
> > > >>>>>>> TaskGroup
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>>> either a
> > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > >> in
> > > >>>> any
> > > >>>>>>> group
> > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>> TaskGroup
> > > >>>>> and
> > > >>>>>>>>>> either
> > > >>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >> as
> > > >>> a
> > > >>>>>> single
> > > >>>>>>>>>>>> "object",
> > > >>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>>> "status"
> > > >>>>>>> of a
> > > >>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I agree with Chris:
> > > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > >>> executor), I
> > > >>>>>> think
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > >> to
> > > >>>>>>> implement
> > > >>>>>>>>>> some
> > > >>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > >>> tasks
> > > >>>>>> etc.)
> > > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > > >>> up
> > > >>>>> the
> > > >>>>>>>>>> individual
> > > >>>>>>>>>>>>>> tasks'
> > > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > >> status
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > >> Imberman
> > > >>> <
> > > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > > >>> to
> > > >>>>> tie
> > > >>>>>>> dags
> > > >>>>>>>>>>>> together
> > > >>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > > >>>> could
> > > >>>>>>>>>> essentially
> > > >>>>>>>>>>>>> write
> > > >>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > >>>> starter-tasks
> > > >>>>>> for
> > > >>>>>>>>> that
> > > >>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > >> UI
> > > >>>>>> concept.
> > > >>>>>>>> It
> > > >>>>>>>>>>>> doesn’t
> > > >>>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > > >>>> tasks
> > > >>>>>> to
> > > >>>>>>>> the
> > > >>>>>>>>>>> queue
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> be executed when there are resources
> > > >> available.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> via Newton Mail [
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >>>>>>>>>>>>>>>>> ]
> > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > >> <
> > > >>>>>>>>>>> chris@crpalmer.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > >>>>>> abstraction.
> > > >>>>>>> I
> > > >>>>>>>>>> think
> > > >>>>>>>>>>>> what
> > > >>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > >> high
> > > >>>>> level
> > > >>>>>> I
> > > >>>>>>>>> think
> > > >>>>>>>>>>> you
> > > >>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>> functionality:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > > >>> the
> > > >>>>>> same
> > > >>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > > >>>>>> TaskGroup
> > > >>>>>>>> and
> > > >>>>>>>>>>>> either
> > > >>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > > >>> any
> > > >>>>>> group
> > > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>> TaskGroup
> > > >>>>> and
> > > >>>>>>>> either
> > > >>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >> as a
> > > >>>>>> single
> > > >>>>>>>>>>> "object",
> > > >>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>> "status"
> > > >>>>>> of
> > > >>>>>>> a
> > > >>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > >>> object
> > > >>>>>> with
> > > >>>>>>>> its
> > > >>>>>>>>>> own
> > > >>>>>>>>>>>>>> database
> > > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > >>>> tasks.
> > > >>>>> I
> > > >>>>>>>> think
> > > >>>>>>>>>> you
> > > >>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>> build
> > > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > >> point
> > > >>> of
> > > >>>>>> view
> > > >>>>>>> a
> > > >>>>>>>>> DAG
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > >> differently.
> > > >>> So
> > > >>>>> it
> > > >>>>>>>> really
> > > >>>>>>>>>>> just
> > > >>>>>>>>>>>>>>> becomes
> > > >>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > > >>> of
> > > >>>>>> Tasks,
> > > >>>>>>>> and
> > > >>>>>>>>>>>> allows
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > >> the
> > > >>>> more
> > > >>>>>>>>> important
> > > >>>>>>>>>>>> issue
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> fix),
> > > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > >>> right
> > > >>>>> way
> > > >>>>>>>>> forward
> > > >>>>>>>>>>>> (just
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> might
> > > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > >>> adding
> > > >>>>>>> visual
> > > >>>>>>>>>>> grouping
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> UI).
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > >>> with
> > > >>>>> more
> > > >>>>>>>>> context
> > > >>>>>>>>>>> on
> > > >>>>>>>>>>>>> why
> > > >>>>>>>>>>>>>>>>> subdags
> > > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > >>>>>>>>>>>>>> . A
> > > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > >> is
> > > >>>> e.g.
> > > >>>>>>>>> enabling
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> operator
> > > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > >>>> well. I
> > > >>>>>> see
> > > >>>>>>>>> this
> > > >>>>>>>>>>>> being
> > > >>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > >> UI
> > > >>>> but
> > > >>>>>> one
> > > >>>>>>> of
> > > >>>>>>>>> the
> > > >>>>>>>>>>> two
> > > >>>>>>>>>>>>>> items
> > > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > >>>>>> functionality.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > >> and
> > > >>>>> they
> > > >>>>>>> are
> > > >>>>>>>>>>> always a
> > > >>>>>>>>>>>>>> giant
> > > >>>>>>>>>>>>>>>>> pain
> > > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > >>>>> confusion
> > > >>>>>>> and
> > > >>>>>>>>>>>> breakages
> > > >>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > >> Coder <
> > > >>>>>>>>>>>> jcoder01@gmail.com>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > >> UI
> > > >>>>>>> concept. I
> > > >>>>>>>>> use
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > >>> you
> > > >>>>>> have a
> > > >>>>>>>>> group
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > >> tasks
> > > >>>>>> start,
> > > >>>>>>>>> using
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > >>>> and I
> > > >>>>>>> think
> > > >>>>>>>>>> also
> > > >>>>>>>>>>>> make
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>> easier
> > > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > >> Hamlin
> > > >>> <
> > > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > >>>>>> Berlin-Taylor
> > > >>>>>>> <
> > > >>>>>>>>>>>>>> ash@apache.org
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Question:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > >>>> anymore?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > >>>>> replacing
> > > >>>>>> it
> > > >>>>>>>>> with
> > > >>>>>>>>>> a
> > > >>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>> grouping
> > > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > >> to
> > > >>>> get
> > > >>>>>>>> wrong,
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>> closer
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > >>>> subdags?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > >>>> subdags
> > > >>>>>>> could
> > > >>>>>>>>>> start
> > > >>>>>>>>>>>>>> running
> > > >>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > >> we
> > > >>>> not
> > > >>>>>>> also
> > > >>>>>>>>> just
> > > >>>>>>>>>>>>>>> _enitrely_
> > > >>>>>>>>>>>>>>>>>>> remove
> > > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > >> it
> > > >>>> with
> > > >>>>>>>>> something
> > > >>>>>>>>>>>>>> simpler.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > >>> haven't
> > > >>>>> used
> > > >>>>>>>> them
> > > >>>>>>>>>>>>>> extensively
> > > >>>>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > >>>> has(?)
> > > >>>>> to
> > > >>>>>>> be
> > > >>>>>>>> of
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>> form
> > > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > > >> schedule_interval,
> > > >>>> but
> > > >>>>>> it
> > > >>>>>>>> has
> > > >>>>>>>>> to
> > > >>>>>>>>>>>> match
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> parent
> > > >>>>>>>>>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > >>>> (Does
> > > >>>>>> it
> > > >>>>>>>> make
> > > >>>>>>>>>>> sense
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>> this?
> > > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > >>> sub
> > > >>>>> dag
> > > >>>>>>>> would
> > > >>>>>>>>>>> never
> > > >>>>>>>>>>>>>>>>> execute, so
> > > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > >>>>> operator a
> > > >>>>>>>>> subdag
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>> always
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > >>>>>> Berlin-Taylor <
> > > >>>>>>>>>>>>>> ash@apache.org>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > >>>>> excited
> > > >>>>>> to
> > > >>>>>>>> see
> > > >>>>>>>>>> how
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>> progresses.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>> parsing*:
> > > >>>>> This
> > > >>>>>>>>>> rewrites
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>> parsing,
> > > >>>>> and
> > > >>>>>> it
> > > >>>>>>>>> will
> > > >>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > >>>> already
> > > >>>>>> does
> > > >>>>>>>>> this
> > > >>>>>>>>>> I
> > > >>>>>>>>>>>>> think.
> > > >>>>>>>>>>>>>>> At
> > > >>>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > >>>> correctly.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > >>>> Huang <
> > > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > >>>> collect
> > > >>>>>>>>> feedback
> > > >>>>>>>>>> on
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> AIP-34
> > > >>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > >>>>>> previously
> > > >>>>>>>>>> briefly
> > > >>>>>>>>>>>>>>>>> mentioned in
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > >>> done
> > > >>>>> for
> > > >>>>>>>>> Airflow
> > > >>>>>>>>>>> 2.0,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> one of
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > >>> attach
> > > >>>>>> tasks
> > > >>>>>>>> back
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> root
> > > >>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > >>>>>> SubDagOperator
> > > >>>>>>>>>> related
> > > >>>>>>>>>>>>>> issues
> > > >>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>>>>>> reattaching
> > > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > >> while
> > > >>>>>>> respecting
> > > >>>>>>>>>>>>>> dependencies
> > > >>>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > >> effect
> > > >>>> on
> > > >>>>>> the
> > > >>>>>>> UI
> > > >>>>>>>>>> will
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> achieved
> > > >>>>>>>>>>>>>>>>>>>> through
> > > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > >>>> function
> > > >>>>>> more
> > > >>>>>>>>>>> reusable
> > > >>>>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>> don't
> > > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > >>>>>>> child_dag_name
> > > >>>>>>>>> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>>>>>>>> signature
> > > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>> parsing*:
> > > >>>>> This
> > > >>>>>>>>>> rewrites
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>> parsing,
> > > >>>>> and
> > > >>>>>> it
> > > >>>>>>>>> will
> > > >>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > >> new
> > > >>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>> acts
> > > >>>>>>>>>>>>>>> like a
> > > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > >>>>> methods
> > > >>>>>>> are
> > > >>>>>>>>>>> removed.
> > > >>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > >> *with
> > > >>>>>>>>> *subdag_args
> > > >>>>>>>>>>> *and
> > > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > >> PythonOperator
> > > >>>>>>>> signature.
> > > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > >>>>>>> current_group
> > > >>>>>>>> &
> > > >>>>>>>>>>>>>> parent_group
> > > >>>>>>>>>>>>>>>>>>>>> attributes
> > > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > >>> used
> > > >>>>> to
> > > >>>>>>>> group
> > > >>>>>>>>>>> tasks
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > >>>>> further
> > > >>>>>>> to
> > > >>>>>>>>>> group
> > > >>>>>>>>>>>>>>> arbitrary
> > > >>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > >>> allow
> > > >>>>>>>>> group-level
> > > >>>>>>>>>>>>>> operations
> > > >>>>>>>>>>>>>>>>>>> (i.e.
> > > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > >>> the
> > > >>>>>> dag)
> > > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > >> Proposed
> > > >>>> UI
> > > >>>>>>>>>> modification
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > >>>> flat
> > > >>>>>>>>> structure
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> pair
> > > >>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> first
> > > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > >>>>> hierarchical
> > > >>>>>>>>>>> structure.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > >> PRs
> > > >>>> for
> > > >>>>>>>> details:
> > > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > >>>>> aspects
> > > >>>>>>>> that
> > > >>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>> agree/disagree
> > > >>>>>>>>>>>>>>>>>>>>>>> with or
> > > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > >>> the
> > > >>>>>> third
> > > >>>>>>>>>> change
> > > >>>>>>>>>>>>>>> regarding
> > > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > >>>> looking
> > > >>>>>>>> forward
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> it!
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>> Thanks & Regards
> > > >>>>>>>>>>>>>>> Poornima
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>>
> > > >>>>>>> Jarek Potiuk
> > > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > >>>>>>>
> > > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > >>>>> <+48%20660%20796%20129>>
> > > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>>
> > > >>>>> Jarek Potiuk
> > > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >>>>>
> > > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > >>>>> <+48%20660%20796%20129>>
> > > >>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>>
> > > >>>> *Jacob Ferriero*
> > > >>>>
> > > >>>> Strategic Cloud Engineer: Data Engineering
> > > >>>>
> > > >>>> jferriero@google.com
> > > >>>>
> > > >>>> 617-714-2509
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Kaxil Naik <ka...@gmail.com>.
What's your ID i.e. if you haven't created an account yet, please create
one at https://cwiki.apache.org/confluence/signup.action and send us your
ID and we will add permissions.

Thanks. I'll edit the AIP. May I request permission to edit it?
> My wiki user email is yuqian1990@gmail.com.


On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com> wrote:

> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit it?
> My wiki user email is yuqian1990@gmail.com.
>
> Re Gerard: yes the UI loads all the nodes as json from the web server at
> once. However, it only adds the top level nodes and edges to the graph when
> the Graph View page is first opened. And then adds the expanded nodes to
> the graph as the user expands them. From what I've experienced with DAGs
> containing around 400 tasks (not using TaskGroup or SubDagOperator),
> opening the whole dag in Graph View usually takes 5 seconds. Less than 60ms
> of that is taken by loading the data from webserver. The remaining 4.9s+ is
> taken by javascript functions in dagre-d3.min.js such as createNodes,
> createEdgeLabels, etc and by rendering the graph. With TaskGroup being used
> to group tasks into a smaller number of top-level nodes, the amount of data
> loaded from webserver will remain about the same compared to a flat dag of
> the same size, but the number of nodes and edges needed to be plot on the
> graph can be reduced significantly. So in theory this should speed up the
> time it takes to open Graph View even without lazy-loading the data (I'll
> experiment to find out). That said, if it comes to a point lazy-loading
> helps, we can still implement it as an improvement.
>
> Re James: the Tree View looks as if all all the groups are fully expanded.
> (because under the hood all the tasks are in a single DAG). I'm less
> worried about Tree View at the moment because it already has a mechanism
> for collapsing tasks by the dependency tree. That said, the Tree View can
> definitely be improved too with TaskGroup. (e.g. collapse tasks in the same
> TaskGroup when Tree View is first opened).
>
> For both suggestions, implementing them don't require fundamental changes
> to the idea. I think we can have a basic working TaskGroup first, and then
> improve it incrementally in several PRs as we get more feedback from the
> community. What do you think?
>
> Qian
>
>
> On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com> wrote:
>
> > I agree this looks great, one question, how does the tree view look?
> >
> > James Coder
> >
> > > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gcasassaez@twitter.com
> .invalid>
> > wrote:
> > >
> > > First of all, this is awesome!!
> > >
> > > Secondly, checking your UI code, seems you are loading all operators at
> > > once. Wondering if we can load them as needed (aka load whenever we
> click
> > > the TaskGroup). Some of our DAGs are so large that take forever to load
> > on
> > > the Graph view, so worried about this still being an issue here. It may
> > be
> > > easily solvable by implementing lazy loading of the graph. Not sure how
> > > easy to implement/add to the UI extension (and dont want to push for
> > early
> > > optimization as its the root of all evil).
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bi...@gmail.com>
> > wrote:
> > >>
> > >> Hi Yu,
> > >>
> > >> Thank you so much for taking on this. I was fairly distracted
> previously
> > >> and I didn't have the time to update the proposal. In fact, after
> > >> discussing with Ash, Kaxil and Daniel, the direction of this AIP has
> > been
> > >> changed to favor the concept of TaskGroup instead of rewriting
> > >> SubDagOperator (though it may may sense to deprecate SubDag in a
> future
> > >> date.).
> > >>
> > >> Your PR is amazing and it has implemented the desire features. I think
> > we
> > >> can focus on your new PR instead. Do you mind updating the AIP based
> on
> > >> what you have done in your PR?
> > >>
> > >> Best,
> > >> Bin
> > >>
> > >>
> > >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com>
> wrote:
> > >>>
> > >>> Hi, all, I've added the basic UI changes to my proposed
> implementation
> > of
> > >>> TaskGroup as UI grouping concept:
> > >>> https://github.com/apache/airflow/pull/10153
> > >>>
> > >>> I think Chris had a pretty good specification of TaskGroup so i'm
> > quoting
> > >>> it here. The only thing I don't fully agree with is the restriction
> > >>> "... **cannot*
> > >>> have dependencies between a Task in a TaskGroup and either a*
> > >>> *   Task in a different TaskGroup or a Task not in any group*". I
> think
> > >>> this is over restrictive. Since TaskGroup is a UI concept, tasks can
> > have
> > >>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In
> my
> > >> PR,
> > >>> this is allowed. The graph edges will update accordingly when
> > TaskGroups
> > >>> are expanded/collapsed. TaskGroup is only helping to make the UI look
> > >> less
> > >>> crowded. Under the hood, everything is still a DAG of tasks and edges
> > so
> > >>> things work normally. Here's a screenshot
> > >>> <
> > >>>
> > >>
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > >>>>
> > >>> of the UI interaction.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> *   - Tasks can be added to a TaskGroup   - You *can* have
> dependencies
> > >>> between Tasks in the same TaskGroup, but   *cannot* have dependencies
> > >>> between a Task in a TaskGroup and either a   Task in a different
> > >> TaskGroup
> > >>> or a Task not in any group   - You *can* have dependencies between a
> > >>> TaskGroup and either other   TaskGroups or Tasks not in any group   -
> > The
> > >>> UI will by default render a TaskGroup as a single "object", but
>  which
> > >> you
> > >>> expand or zoom into in some way   - You'd need some way to determine
> > what
> > >>> the "status" of a TaskGroup was   at least for UI display purposes*
> > >>>
> > >>>
> > >>> Regarding Jake's comment, I agree it's possible to implement the
> > >> "retrying
> > >>> tasks in a group" pattern he mentioned as an optional feature of
> > >> TaskGroup
> > >>> although that may go against having TaskGroup as a pure UI concept.
> For
> > >> the
> > >>> motivating example Jake provided, I suggest implementing both
> > >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single
> operator.
> > It
> > >>> can do something like BaseSensorOperator.execute() does in
> "reschedule"
> > >>> mode, i.e. it first executes some code to submit the long running job
> > to
> > >>> the external service, and store the state (e.g. in XCom). Then
> > reschedule
> > >>> itself. Subsequent runs then pokes for the completion state.
> > >>>
> > >>>
> > >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > >> <jferriero@google.com.invalid
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> I really like this idea of a TaskGroup container as I think this
> will
> > >> be
> > >>>> much easier to use than SubDag.
> > >>>>
> > >>>> I'd like to propose an optional behavior for special retry mechanics
> > >> via
> > >>> a
> > >>>> TaskGroup.retry_all property.
> > >>>> This way I could use TaskGroup to replace my favorite use of SubDag
> > for
> > >>>> atomically retrying tasks of the pattern "act on external state then
> > >>>> reschedule poll until desired state reached".
> > >>>>
> > >>>> Motivating use case I have for a SubDag is very simple two task
> group
> > >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > >>>> I use SubDag is because it gives me an easy way to retry the
> > >>> SubmitJobTask
> > >>>> if something about the PollJobSensor fails.
> > >>>> This pattern would be really nice for jobs that are expected to run
> a
> > >>> long
> > >>>> time (because we can use sensor can use reschedule mode freeing up
> > >> slots)
> > >>>> but might fail for a retryable reason.
> > >>>> However, using SubDag to meet this use case defeats the purpose
> > because
> > >>>> SubDag infamously
> > >>>> <
> > >>>>
> > >>>
> > >>
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > >>>>>
> > >>>> blocks a "controller" slot for the entire duration.
> > >>>> This may feel like a cyclic behavior but reality it is very common
> for
> > >> a
> > >>>> single operator to submit job / wait til done.
> > >>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
> > >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask]
> with
> > >> an
> > >>>> optional reschedule mode if user knows that this job may take a long
> > >>> time.
> > >>>>
> > >>>> I'd be happy to the development work on adding this specific retry
> > >>> behavior
> > >>>> to TaskGroup once the base concept is implemented if others in the
> > >>>> community would find this a useful feature.
> > >>>>
> > >>>> Cheers,
> > >>>> Jake
> > >>>>
> > >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> All for it :) . I think we are getting closer to have regular
> > >> planning
> > >>>> and
> > >>>>> making some structured approach to 2.0 and starting task force for
> it
> > >>>> soon,
> > >>>>> so I think this should be perfectly fine to discuss and even start
> > >>>>> implementing what's beyond as soon as we make sure that we are
> > >>>> prioritizing
> > >>>>> 2.0 work.
> > >>>>>
> > >>>>> J,
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi Jarek,
> > >>>>>>
> > >>>>>> I agree we should not change the behaviour of the existing
> > >>>> SubDagOperator
> > >>>>>> till Airflow 2.1. Is it okay to continue the discussion about
> > >>> TaskGroup
> > >>>>> as
> > >>>>>> a brand new concept/feature independent from the existing
> > >>>> SubDagOperator?
> > >>>>>> In other words, shall we add TaskGroup as a UI grouping concept
> > >> like
> > >>>> Ash
> > >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
> > >>> ready
> > >>>>> with
> > >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > >>>>>>
> > >>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
> > >> into
> > >>> a
> > >>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching
> > >> all
> > >>>> the
> > >>>>>> tasks to the root DAG" is the way to go. And I see James pointed
> > >> out
> > >>> we
> > >>>>>> need some helper functions to simplify dependencies setting of
> > >>>> TaskGroup.
> > >>>>>> Xinbin put up a pretty elegant example in his PR
> > >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
> > >>>> TaskGroup
> > >>>>> as
> > >>>>>> a UI concept should be a relatively small change. We can simplify
> > >>>>> Xinbin's
> > >>>>>> PR further. So I put up this alternative proposal here:
> > >>>>>> https://github.com/apache/airflow/pull/10153
> > >>>>>>
> > >>>>>> I have not done any UI changes due to lack of experience with web
> > >> UI.
> > >>>> If
> > >>>>>> anyone's interested, please take a look at the PR.
> > >>>>>>
> > >>>>>> Qian
> > >>>>>>
> > >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > >>> Jarek.Potiuk@polidea.com
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Similar point here to the other ideas that are popping up. Maybe
> > >> we
> > >>>>>> should
> > >>>>>>> just focus on completing 2.0 and make all discussions about
> > >> further
> > >>>>>>> improvements to 2.1? While those are important discussions (and
> > >> we
> > >>>>> should
> > >>>>>>> continue them in the  near future !) I think at this point
> > >> focusing
> > >>>> on
> > >>>>>>> delivering 2.0 in its current shape should be our focus now ?
> > >>>>>>>
> > >>>>>>> J.
> > >>>>>>>
> > >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > >>> bin.huangxb@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Daniel
> > >>>>>>>>
> > >>>>>>>> I agree that the TaskGroup should have the same API as a DAG
> > >>> object
> > >>>>>>> related
> > >>>>>>>> to task dependencies, but it will not have anything related to
> > >>>> actual
> > >>>>>>>> execution or scheduling.
> > >>>>>>>> I will update the AIP according to this over the weekend.
> > >>>>>>>>
> > >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> > >> import
> > >>>> the
> > >>>>>>> object
> > >>>>>>>> you can import it with parameters to determine the shape of the
> > >>>> DAG.
> > >>>>>>>>
> > >>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
> > >>>> purpose
> > >>>>>> as
> > >>>>>>> a
> > >>>>>>>> DAG factory function?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > >>>>>>> daniel.imberman@gmail.com
> > >>>>>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Bin,
> > >>>>>>>>>
> > >>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
> > >>> the
> > >>>>>>> bitwise
> > >>>>>>>>> operator fro task dependencies). We could even make a
> > >>>> “DAGTemplate”
> > >>>>>>>> object
> > >>>>>>>>> s.t. when you import the object you can import it with
> > >>> parameters
> > >>>>> to
> > >>>>>>>>> determine the shape of the DAG.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > >>>>> bin.huangxb@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>> The TaskGroup will not take schedule interval as a parameter
> > >>>>> itself,
> > >>>>>>> and
> > >>>>>>>> it
> > >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
> > >>>>> TaskGroup
> > >>>>>>>> will
> > >>>>>>>>> only contain a group of tasks with interdependencies, and the
> > >>>>>> TaskGroup
> > >>>>>>>>> behaves like a task. It doesn't contain any
> > >>> execution/scheduling
> > >>>>>> logic
> > >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
> > >>> like
> > >>>> a
> > >>>>>> DAG
> > >>>>>>>>> does.
> > >>>>>>>>>
> > >>>>>>>>>> For example, there is the scenario that the schedule
> > >> interval
> > >>>> of
> > >>>>>> DAG
> > >>>>>>> is
> > >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> > >>>>>>>>>
> > >>>>>>>>> I am curious why you ask this. Is this a use case that you
> > >> want
> > >>>> to
> > >>>>>>>> achieve?
> > >>>>>>>>>
> > >>>>>>>>> Bin
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > >> thanosxnicholas@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Bin,
> > >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
> > >>> same
> > >>>>> as
> > >>>>>>> the
> > >>>>>>>>>> parent DAG? My main concern is whether the schedule
> > >> interval
> > >>> of
> > >>>>>>>> TaskGroup
> > >>>>>>>>>> could be different with that of the DAG? For example, there
> > >>> is
> > >>>>> the
> > >>>>>>>>> scenario
> > >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > >> schedule
> > >>>>>> interval
> > >>>>>>>> of
> > >>>>>>>>>> TaskGroup is 20 min.
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Nicholas
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > >>>>>> bin.huangxb@gmail.com
> > >>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi Nicholas,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> > >>> maybe
> > >>>>> it
> > >>>>>>> will
> > >>>>>>>>>> throw
> > >>>>>>>>>>> an error? But in the original proposal, the subdag's
> > >>>>>>>> schedule_interval
> > >>>>>>>>>> will
> > >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> > >>>> SubDag,
> > >>>>>>> there
> > >>>>>>>>>> will
> > >>>>>>>>>>> be no subdag schedule_interval.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Bin
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > >>>> thanosxnicholas@gmail.com
> > >>>>>>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Bin,
> > >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> > >> the
> > >>>>>>> schedule
> > >>>>>>>>>>>> interval of SubDAG is different from that of the parent
> > >>>> DAG?
> > >>>>> I
> > >>>>>>> have
> > >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
> > >>> of
> > >>>>>>> SubDAG.
> > >>>>>>>> If
> > >>>>>>>>>> the
> > >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> > >>> will
> > >>>>>>> happen
> > >>>>>>>>> for
> > >>>>>>>>>>> the
> > >>>>>>>>>>>> scheduler to schedule the parent DAG?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>> Nicholas Jiang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > >>>>>>>> bin.huangxb@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> > >>>>>> groups. I
> > >>>>>>>>> think
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> better way to approach this is to entirely remove
> > >>> subdag
> > >>>>> and
> > >>>>>>>>>> introduce
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> > >>> along
> > >>>>>> with
> > >>>>>>>>> their
> > >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> > >>>> DAG*.
> > >>>>>> The
> > >>>>>>>>> only
> > >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> > >>> still
> > >>>>> need
> > >>>>>>> to
> > >>>>>>>>> add
> > >>>>>>>>>> it
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>> a DAG for execution.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here is a small code snippet.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ```
> > >>>>>>>>>>>>> class TaskGroup:
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> If default_args is missing, it will take default args
> > >>>> from
> > >>>>>> the
> > >>>>>>>>>> DAG.
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > >>>>>>>>>>>>> pass
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> > >>> tasks
> > >>>>> to
> > >>>>>> a
> > >>>>>>>> DAG
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> This can be declared in a separate file from the dag
> > >>> file
> > >>>>>>>>>>>>> """
> > >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > >>>>>>>>>>>> default_args=default_args)
> > >>>>>>>>>>>>> download_group.add_task(task1)
> > >>>>>>>>>>>>> task2.dag = download_group
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> with download_group:
> > >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [task, task2] >> task3
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> """Add it to a DAG for execution"""
> > >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > >>>>>>> default_args=default_args,
> > >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > >>>>>>>>>>>>> start >> download_group
> > >>>>>>>>>>>>> # this is equivalent to
> > >>>>>>>>>>>>> # start >> [task, task2] >> task3
> > >>>>>>>>>>>>> ```
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> > >> set
> > >>>>>>>> dependencies
> > >>>>>>>>>>>> between
> > >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > >>>>>> SubDagOperator,
> > >>>>>>>> and
> > >>>>>>>>>> we
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> User migration wise, we can introduce it before
> > >> Airflow
> > >>>> 2.0
> > >>>>>> and
> > >>>>>>>>> allow
> > >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> > >> want
> > >>>> to
> > >>>>>> keep
> > >>>>>>>> the
> > >>>>>>>>>>>>> SubDagOperator or simply remove it.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Any thoughts?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> +1, proposal looks good.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> The original intention was really to have tasks
> > >>> groups
> > >>>>> and
> > >>>>>> a
> > >>>>>>>>>>>> zoom-in/out
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> > >>>>> object
> > >>>>>>>> since
> > >>>>>>>>> it
> > >>>>>>>>>>> is
> > >>>>>>>>>>>> a
> > >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > >>> create
> > >>>>>>>> underlying
> > >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > >> group
> > >>>> of
> > >>>>>>> tasks.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Max
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > >>>>>>>>>>>>> joshipoornima06@gmail.com>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thank you for your email.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > >>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > >>>>>> rewrites
> > >>>>>>>> the
> > >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > >> it
> > >>>>> will
> > >>>>>>>> give a
> > >>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> The serialized_dag representation already
> > >> does
> > >>>>> this I
> > >>>>>>>>> think.
> > >>>>>>>>>> At
> > >>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > >>> representation,
> > >>>>> but
> > >>>>>> at
> > >>>>>>>>> least
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > >> In
> > >>> my
> > >>>>>>>> proposal
> > >>>>>>>>> as
> > >>>>>>>>>>>> also
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > >> from
> > >>>> the
> > >>>>>>> subdag
> > >>>>>>>>> and
> > >>>>>>>>>>> add
> > >>>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> > >>>> will
> > >>>>>> look
> > >>>>>>>>>> exactly
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> same as without subdag but with metadata
> > >> attached
> > >>>> to
> > >>>>>>> those
> > >>>>>>>>>>>> sections.
> > >>>>>>>>>>>>>>> These
> > >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> > >>> UI.
> > >>>>> So
> > >>>>>>>> after
> > >>>>>>>>>>>> parsing
> > >>>>>>>>>>>>> (
> > >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > >> the
> > >>>>>>> *root_dag
> > >>>>>>>>>>>> *instead
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> *root_dag +
> > >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > >>>>>>>>>> current_group=section-1,
> > >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > >>> naming
> > >>>>>>>>>>> suggestions),
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > >>> nested
> > >>>>>> group
> > >>>>>>>> and
> > >>>>>>>>>>>> still
> > >>>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> able to capture the dependency.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Runtime DAG:
> > >>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> > >>>> like
> > >>>>>> this
> > >>>>>>>> by
> > >>>>>>>>>>>>> utilizing
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > >> in
> > >>>> some
> > >>>>>>> way.
> > >>>>>>>>>>>>>>>> [image: image.png]
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> The benefits I can see is that:
> > >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > >>> complexity
> > >>>> of
> > >>>>>>>> SubDag
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>> execution
> > >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > >> using
> > >>>>>> SubDag.
> > >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > >>>>> reusable
> > >>>>>>> dag
> > >>>>>>>>> code
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> > >>> new
> > >>>>>>>>>>> SubDagOperator
> > >>>>>>>>>>>>> (see
> > >>>>>>>>>>>>>>> AIP
> > >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > >>>>> function
> > >>>>>>> for
> > >>>>>>>>>>>>> generating 1
> > >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > >>> (in
> > >>>>> this
> > >>>>>>>> case,
> > >>>>>>>>>> it
> > >>>>>>>>>>>> will
> > >>>>>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > >>> root
> > >>>>>> dag).
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > >>>> with a
> > >>>>>>>>>> simpler
> > >>>>>>>>>>>>>> concept
> > >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > >> out
> > >>>> the
> > >>>>>>>>>> contents
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>> SubDag
> > >>>>>>>>>>>>>>>> and becomes more like
> > >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > >>>>>>>>>>>>>>> (forgive
> > >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> > >>>> still
> > >>>>>>>>>>> necessary
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> keep the
> > >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > >>>> name?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > >>>> Chris
> > >>>>>>> Palmer
> > >>>>>>>>> for
> > >>>>>>>>>>>>> helping
> > >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> > >>>> will
> > >>>>>> just
> > >>>>>>>>> paste
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>>> here.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > >> in
> > >>>> the
> > >>>>>> same
> > >>>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > >> a
> > >>>>>>> TaskGroup
> > >>>>>>>>>> and
> > >>>>>>>>>>>>>> either a
> > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > >> in
> > >>>> any
> > >>>>>>> group
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>> TaskGroup
> > >>>>> and
> > >>>>>>>>>> either
> > >>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >> as
> > >>> a
> > >>>>>> single
> > >>>>>>>>>>>> "object",
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>>> "status"
> > >>>>>>> of a
> > >>>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I agree with Chris:
> > >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > >>> executor), I
> > >>>>>> think
> > >>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > >> to
> > >>>>>>> implement
> > >>>>>>>>>> some
> > >>>>>>>>>>>>>> metadata
> > >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > >>> tasks
> > >>>>>> etc.)
> > >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> > >>> up
> > >>>>> the
> > >>>>>>>>>> individual
> > >>>>>>>>>>>>>> tasks'
> > >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > >> status
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > >> Imberman
> > >>> <
> > >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> > >>> to
> > >>>>> tie
> > >>>>>>> dags
> > >>>>>>>>>>>> together
> > >>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> > >>>> could
> > >>>>>>>>>> essentially
> > >>>>>>>>>>>>> write
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > >>>> starter-tasks
> > >>>>>> for
> > >>>>>>>>> that
> > >>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > >> UI
> > >>>>>> concept.
> > >>>>>>>> It
> > >>>>>>>>>>>> doesn’t
> > >>>>>>>>>>>>>> need
> > >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> > >>>> tasks
> > >>>>>> to
> > >>>>>>>> the
> > >>>>>>>>>>> queue
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>> be executed when there are resources
> > >> available.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> via Newton Mail [
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > >>>>>>>>>>>>>>>>> ]
> > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > >> <
> > >>>>>>>>>>> chris@crpalmer.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > >>>>>> abstraction.
> > >>>>>>> I
> > >>>>>>>>>> think
> > >>>>>>>>>>>> what
> > >>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > >> high
> > >>>>> level
> > >>>>>> I
> > >>>>>>>>> think
> > >>>>>>>>>>> you
> > >>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>> functionality:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> > >>> the
> > >>>>>> same
> > >>>>>>>>>>> TaskGroup,
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> > >>>>>> TaskGroup
> > >>>>>>>> and
> > >>>>>>>>>>>> either
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> > >>> any
> > >>>>>> group
> > >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > >>> TaskGroup
> > >>>>> and
> > >>>>>>>> either
> > >>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > >> as a
> > >>>>>> single
> > >>>>>>>>>>> "object",
> > >>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > >>>> "status"
> > >>>>>> of
> > >>>>>>> a
> > >>>>>>>>>>>> TaskGroup
> > >>>>>>>>>>>>>> was
> > >>>>>>>>>>>>>>>>> at least for UI display purposes
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > >>> object
> > >>>>>> with
> > >>>>>>>> its
> > >>>>>>>>>> own
> > >>>>>>>>>>>>>> database
> > >>>>>>>>>>>>>>>>> table and model or just another attribute on
> > >>>> tasks.
> > >>>>> I
> > >>>>>>>> think
> > >>>>>>>>>> you
> > >>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>> build
> > >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > >> point
> > >>> of
> > >>>>>> view
> > >>>>>>> a
> > >>>>>>>>> DAG
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > >> differently.
> > >>> So
> > >>>>> it
> > >>>>>>>> really
> > >>>>>>>>>>> just
> > >>>>>>>>>>>>>>> becomes
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> > >>> of
> > >>>>>> Tasks,
> > >>>>>>>> and
> > >>>>>>>>>>>> allows
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Chris
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > >> the
> > >>>> more
> > >>>>>>>>> important
> > >>>>>>>>>>>> issue
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> fix),
> > >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > >>> right
> > >>>>> way
> > >>>>>>>>> forward
> > >>>>>>>>>>>> (just
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> might
> > >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > >>> adding
> > >>>>>>> visual
> > >>>>>>>>>>> grouping
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> UI).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > >>> with
> > >>>>> more
> > >>>>>>>>> context
> > >>>>>>>>>>> on
> > >>>>>>>>>>>>> why
> > >>>>>>>>>>>>>>>>> subdags
> > >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > >>>>>>>>>>>>>> . A
> > >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > >> is
> > >>>> e.g.
> > >>>>>>>>> enabling
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> operator
> > >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > >>>> well. I
> > >>>>>> see
> > >>>>>>>>> this
> > >>>>>>>>>>>> being
> > >>>>>>>>>>>>>>>>> separate
> > >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > >> UI
> > >>>> but
> > >>>>>> one
> > >>>>>>> of
> > >>>>>>>>> the
> > >>>>>>>>>>> two
> > >>>>>>>>>>>>>> items
> > >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > >>>>>> functionality.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > >> and
> > >>>>> they
> > >>>>>>> are
> > >>>>>>>>>>> always a
> > >>>>>>>>>>>>>> giant
> > >>>>>>>>>>>>>>>>> pain
> > >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > >>>>> confusion
> > >>>>>>> and
> > >>>>>>>>>>>> breakages
> > >>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > >> Coder <
> > >>>>>>>>>>>> jcoder01@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > >> UI
> > >>>>>>> concept. I
> > >>>>>>>>> use
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > >>> you
> > >>>>>> have a
> > >>>>>>>>> group
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> need to finish before another group of
> > >> tasks
> > >>>>>> start,
> > >>>>>>>>> using
> > >>>>>>>>>> a
> > >>>>>>>>>>>>> subdag
> > >>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > >>>> and I
> > >>>>>>> think
> > >>>>>>>>>> also
> > >>>>>>>>>>>> make
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>> easier
> > >>>>>>>>>>>>>>>>>>> to follow the dag code.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > >> Hamlin
> > >>> <
> > >>>>>>>>>>>>> hamlin.kn@gmail.com>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > >>>>>> Berlin-Taylor
> > >>>>>>> <
> > >>>>>>>>>>>>>> ash@apache.org
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Question:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > >>>> anymore?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > >>>>> replacing
> > >>>>>> it
> > >>>>>>>>> with
> > >>>>>>>>>> a
> > >>>>>>>>>>> UI
> > >>>>>>>>>>>>>>>>> grouping
> > >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > >> to
> > >>>> get
> > >>>>>>>> wrong,
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> closer
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> what
> > >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > >>>> subdags?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > >>>> subdags
> > >>>>>>> could
> > >>>>>>>>>> start
> > >>>>>>>>>>>>>> running
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > >> we
> > >>>> not
> > >>>>>>> also
> > >>>>>>>>> just
> > >>>>>>>>>>>>>>> _enitrely_
> > >>>>>>>>>>>>>>>>>>> remove
> > >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > >> it
> > >>>> with
> > >>>>>>>>> something
> > >>>>>>>>>>>>>> simpler.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > >>> haven't
> > >>>>> used
> > >>>>>>>> them
> > >>>>>>>>>>>>>> extensively
> > >>>>>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>>>> may
> > >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > >>>> has(?)
> > >>>>> to
> > >>>>>>> be
> > >>>>>>>> of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> form
> > >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > >>>>>>>>>>>>>>>>>>>>> - They need their own
> > >> schedule_interval,
> > >>>> but
> > >>>>>> it
> > >>>>>>>> has
> > >>>>>>>>> to
> > >>>>>>>>>>>> match
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> parent
> > >>>>>>>>>>>>>>>>>>>> dag
> > >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > >>>> (Does
> > >>>>>> it
> > >>>>>>>> make
> > >>>>>>>>>>> sense
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>> this?
> > >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > >>> sub
> > >>>>> dag
> > >>>>>>>> would
> > >>>>>>>>>>> never
> > >>>>>>>>>>>>>>>>> execute, so
> > >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > >>>>> operator a
> > >>>>>>>>> subdag
> > >>>>>>>>>>> with
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>> always
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Thoughts?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > >>>>>> Berlin-Taylor <
> > >>>>>>>>>>>>>> ash@apache.org>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > >>>>> excited
> > >>>>>> to
> > >>>>>>>> see
> > >>>>>>>>>> how
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>> progresses.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>> parsing*:
> > >>>>> This
> > >>>>>>>>>> rewrites
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>> parsing,
> > >>>>> and
> > >>>>>> it
> > >>>>>>>>> will
> > >>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > >>>> already
> > >>>>>> does
> > >>>>>>>>> this
> > >>>>>>>>>> I
> > >>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>> At
> > >>>>>>>>>>>>>>>>>> least
> > >>>>>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > >>>> correctly.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> -ash
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > >>>> Huang <
> > >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > >>>> collect
> > >>>>>>>>> feedback
> > >>>>>>>>>> on
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> AIP-34
> > >>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > >>>>>> previously
> > >>>>>>>>>> briefly
> > >>>>>>>>>>>>>>>>> mentioned in
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > >>> done
> > >>>>> for
> > >>>>>>>>> Airflow
> > >>>>>>>>>>> 2.0,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> one of
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > >>> attach
> > >>>>>> tasks
> > >>>>>>>> back
> > >>>>>>>>>> to
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> root
> > >>>>>>>>>>>>>>>>> DAG.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > >>>>>> SubDagOperator
> > >>>>>>>>>> related
> > >>>>>>>>>>>>>> issues
> > >>>>>>>>>>>>>>> by
> > >>>>>>>>>>>>>>>>>>>>> reattaching
> > >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > >> while
> > >>>>>>> respecting
> > >>>>>>>>>>>>>> dependencies
> > >>>>>>>>>>>>>>>>>> during
> > >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > >> effect
> > >>>> on
> > >>>>>> the
> > >>>>>>> UI
> > >>>>>>>>>> will
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>> achieved
> > >>>>>>>>>>>>>>>>>>>> through
> > >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > >>>> function
> > >>>>>> more
> > >>>>>>>>>>> reusable
> > >>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > >>>>>>> child_dag_name
> > >>>>>>>>> in
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> function
> > >>>>>>>>>>>>>>>>>>>>> signature
> > >>>>>>>>>>>>>>>>>>>>>>> anymore.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > >>> parsing*:
> > >>>>> This
> > >>>>>>>>>> rewrites
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > >>> parsing,
> > >>>>> and
> > >>>>>> it
> > >>>>>>>>> will
> > >>>>>>>>>>>> give a
> > >>>>>>>>>>>>>>> flat
> > >>>>>>>>>>>>>>>>>>>>>>> structure at
> > >>>>>>>>>>>>>>>>>>>>>>> the task level
> > >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > >> new
> > >>>>>>>>> SubDagOperator
> > >>>>>>>>>>>> acts
> > >>>>>>>>>>>>>>> like a
> > >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > >>>>> methods
> > >>>>>>> are
> > >>>>>>>>>>> removed.
> > >>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>>>>>>>> signature is
> > >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > >> *with
> > >>>>>>>>> *subdag_args
> > >>>>>>>>>>> *and
> > >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > >> PythonOperator
> > >>>>>>>> signature.
> > >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > >>>>>>> current_group
> > >>>>>>>> &
> > >>>>>>>>>>>>>> parent_group
> > >>>>>>>>>>>>>>>>>>>>> attributes
> > >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > >>> used
> > >>>>> to
> > >>>>>>>> group
> > >>>>>>>>>>> tasks
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>> rendering at
> > >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > >>>>> further
> > >>>>>>> to
> > >>>>>>>>>> group
> > >>>>>>>>>>>>>>> arbitrary
> > >>>>>>>>>>>>>>>>>>> tasks
> > >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > >>> allow
> > >>>>>>>>> group-level
> > >>>>>>>>>>>>>> operations
> > >>>>>>>>>>>>>>>>>>> (i.e.
> > >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > >>> the
> > >>>>>> dag)
> > >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > >> Proposed
> > >>>> UI
> > >>>>>>>>>> modification
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> allow
> > >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > >>>> flat
> > >>>>>>>>> structure
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> pair
> > >>>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> first
> > >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > >>>>> hierarchical
> > >>>>>>>>>>> structure.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > >> PRs
> > >>>> for
> > >>>>>>>> details:
> > >>>>>>>>>>>>>>>>>>>>>>> AIP:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > >>>>> aspects
> > >>>>>>>> that
> > >>>>>>>>>> you
> > >>>>>>>>>>>>>>>>>> agree/disagree
> > >>>>>>>>>>>>>>>>>>>>>>> with or
> > >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > >>> the
> > >>>>>> third
> > >>>>>>>>>> change
> > >>>>>>>>>>>>>>> regarding
> > >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > >>>> looking
> > >>>>>>>> forward
> > >>>>>>>>>> to
> > >>>>>>>>>>>> it!
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>> Bin
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>> Thanks & Regards
> > >>>>>>>>>>>>>>> Poornima
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>>
> > >>>>>>> Jarek Potiuk
> > >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>>>
> > >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>> <+48%20660%20796%20129>>
> > >>>>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> Jarek Potiuk
> > >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>>>>
> > >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > >>>>> <+48%20660%20796%20129>>
> > >>>>> [image: Polidea] <https://www.polidea.com/>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>>
> > >>>> *Jacob Ferriero*
> > >>>>
> > >>>> Strategic Cloud Engineer: Data Engineering
> > >>>>
> > >>>> jferriero@google.com
> > >>>>
> > >>>> 617-714-2509
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Re, Xinbin. Thanks. I'll edit the AIP. May I request permission to edit it?
My wiki user email is yuqian1990@gmail.com.

Re Gerard: yes the UI loads all the nodes as json from the web server at
once. However, it only adds the top level nodes and edges to the graph when
the Graph View page is first opened. And then adds the expanded nodes to
the graph as the user expands them. From what I've experienced with DAGs
containing around 400 tasks (not using TaskGroup or SubDagOperator),
opening the whole dag in Graph View usually takes 5 seconds. Less than 60ms
of that is taken by loading the data from webserver. The remaining 4.9s+ is
taken by javascript functions in dagre-d3.min.js such as createNodes,
createEdgeLabels, etc and by rendering the graph. With TaskGroup being used
to group tasks into a smaller number of top-level nodes, the amount of data
loaded from webserver will remain about the same compared to a flat dag of
the same size, but the number of nodes and edges needed to be plot on the
graph can be reduced significantly. So in theory this should speed up the
time it takes to open Graph View even without lazy-loading the data (I'll
experiment to find out). That said, if it comes to a point lazy-loading
helps, we can still implement it as an improvement.

Re James: the Tree View looks as if all all the groups are fully expanded.
(because under the hood all the tasks are in a single DAG). I'm less
worried about Tree View at the moment because it already has a mechanism
for collapsing tasks by the dependency tree. That said, the Tree View can
definitely be improved too with TaskGroup. (e.g. collapse tasks in the same
TaskGroup when Tree View is first opened).

For both suggestions, implementing them don't require fundamental changes
to the idea. I think we can have a basic working TaskGroup first, and then
improve it incrementally in several PRs as we get more feedback from the
community. What do you think?

Qian


On Wed, Aug 12, 2020 at 9:15 AM James Coder <jc...@gmail.com> wrote:

> I agree this looks great, one question, how does the tree view look?
>
> James Coder
>
> > On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gc...@twitter.com.invalid>
> wrote:
> >
> > First of all, this is awesome!!
> >
> > Secondly, checking your UI code, seems you are loading all operators at
> > once. Wondering if we can load them as needed (aka load whenever we click
> > the TaskGroup). Some of our DAGs are so large that take forever to load
> on
> > the Graph view, so worried about this still being an issue here. It may
> be
> > easily solvable by implementing lazy loading of the graph. Not sure how
> > easy to implement/add to the UI extension (and dont want to push for
> early
> > optimization as its the root of all evil).
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> >> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bi...@gmail.com>
> wrote:
> >>
> >> Hi Yu,
> >>
> >> Thank you so much for taking on this. I was fairly distracted previously
> >> and I didn't have the time to update the proposal. In fact, after
> >> discussing with Ash, Kaxil and Daniel, the direction of this AIP has
> been
> >> changed to favor the concept of TaskGroup instead of rewriting
> >> SubDagOperator (though it may may sense to deprecate SubDag in a future
> >> date.).
> >>
> >> Your PR is amazing and it has implemented the desire features. I think
> we
> >> can focus on your new PR instead. Do you mind updating the AIP based on
> >> what you have done in your PR?
> >>
> >> Best,
> >> Bin
> >>
> >>
> >>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com> wrote:
> >>>
> >>> Hi, all, I've added the basic UI changes to my proposed implementation
> of
> >>> TaskGroup as UI grouping concept:
> >>> https://github.com/apache/airflow/pull/10153
> >>>
> >>> I think Chris had a pretty good specification of TaskGroup so i'm
> quoting
> >>> it here. The only thing I don't fully agree with is the restriction
> >>> "... **cannot*
> >>> have dependencies between a Task in a TaskGroup and either a*
> >>> *   Task in a different TaskGroup or a Task not in any group*". I think
> >>> this is over restrictive. Since TaskGroup is a UI concept, tasks can
> have
> >>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In my
> >> PR,
> >>> this is allowed. The graph edges will update accordingly when
> TaskGroups
> >>> are expanded/collapsed. TaskGroup is only helping to make the UI look
> >> less
> >>> crowded. Under the hood, everything is still a DAG of tasks and edges
> so
> >>> things work normally. Here's a screenshot
> >>> <
> >>>
> >>
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> >>>>
> >>> of the UI interaction.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *   - Tasks can be added to a TaskGroup   - You *can* have dependencies
> >>> between Tasks in the same TaskGroup, but   *cannot* have dependencies
> >>> between a Task in a TaskGroup and either a   Task in a different
> >> TaskGroup
> >>> or a Task not in any group   - You *can* have dependencies between a
> >>> TaskGroup and either other   TaskGroups or Tasks not in any group   -
> The
> >>> UI will by default render a TaskGroup as a single "object", but   which
> >> you
> >>> expand or zoom into in some way   - You'd need some way to determine
> what
> >>> the "status" of a TaskGroup was   at least for UI display purposes*
> >>>
> >>>
> >>> Regarding Jake's comment, I agree it's possible to implement the
> >> "retrying
> >>> tasks in a group" pattern he mentioned as an optional feature of
> >> TaskGroup
> >>> although that may go against having TaskGroup as a pure UI concept. For
> >> the
> >>> motivating example Jake provided, I suggest implementing both
> >>> SubmitLongRunningJobTask and PollJobStatusSensor in a single operator.
> It
> >>> can do something like BaseSensorOperator.execute() does in "reschedule"
> >>> mode, i.e. it first executes some code to submit the long running job
> to
> >>> the external service, and store the state (e.g. in XCom). Then
> reschedule
> >>> itself. Subsequent runs then pokes for the completion state.
> >>>
> >>>
> >>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> >> <jferriero@google.com.invalid
> >>>>
> >>> wrote:
> >>>
> >>>> I really like this idea of a TaskGroup container as I think this will
> >> be
> >>>> much easier to use than SubDag.
> >>>>
> >>>> I'd like to propose an optional behavior for special retry mechanics
> >> via
> >>> a
> >>>> TaskGroup.retry_all property.
> >>>> This way I could use TaskGroup to replace my favorite use of SubDag
> for
> >>>> atomically retrying tasks of the pattern "act on external state then
> >>>> reschedule poll until desired state reached".
> >>>>
> >>>> Motivating use case I have for a SubDag is very simple two task group
> >>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> >>>> I use SubDag is because it gives me an easy way to retry the
> >>> SubmitJobTask
> >>>> if something about the PollJobSensor fails.
> >>>> This pattern would be really nice for jobs that are expected to run a
> >>> long
> >>>> time (because we can use sensor can use reschedule mode freeing up
> >> slots)
> >>>> but might fail for a retryable reason.
> >>>> However, using SubDag to meet this use case defeats the purpose
> because
> >>>> SubDag infamously
> >>>> <
> >>>>
> >>>
> >>
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> >>>>>
> >>>> blocks a "controller" slot for the entire duration.
> >>>> This may feel like a cyclic behavior but reality it is very common for
> >> a
> >>>> single operator to submit job / wait til done.
> >>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
> >>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with
> >> an
> >>>> optional reschedule mode if user knows that this job may take a long
> >>> time.
> >>>>
> >>>> I'd be happy to the development work on adding this specific retry
> >>> behavior
> >>>> to TaskGroup once the base concept is implemented if others in the
> >>>> community would find this a useful feature.
> >>>>
> >>>> Cheers,
> >>>> Jake
> >>>>
> >>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> >>>
> >>>> wrote:
> >>>>
> >>>>> All for it :) . I think we are getting closer to have regular
> >> planning
> >>>> and
> >>>>> making some structured approach to 2.0 and starting task force for it
> >>>> soon,
> >>>>> so I think this should be perfectly fine to discuss and even start
> >>>>> implementing what's beyond as soon as we make sure that we are
> >>>> prioritizing
> >>>>> 2.0 work.
> >>>>>
> >>>>> J,
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi Jarek,
> >>>>>>
> >>>>>> I agree we should not change the behaviour of the existing
> >>>> SubDagOperator
> >>>>>> till Airflow 2.1. Is it okay to continue the discussion about
> >>> TaskGroup
> >>>>> as
> >>>>>> a brand new concept/feature independent from the existing
> >>>> SubDagOperator?
> >>>>>> In other words, shall we add TaskGroup as a UI grouping concept
> >> like
> >>>> Ash
> >>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
> >>> ready
> >>>>> with
> >>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> >>>>>>
> >>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
> >> into
> >>> a
> >>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching
> >> all
> >>>> the
> >>>>>> tasks to the root DAG" is the way to go. And I see James pointed
> >> out
> >>> we
> >>>>>> need some helper functions to simplify dependencies setting of
> >>>> TaskGroup.
> >>>>>> Xinbin put up a pretty elegant example in his PR
> >>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
> >>>> TaskGroup
> >>>>> as
> >>>>>> a UI concept should be a relatively small change. We can simplify
> >>>>> Xinbin's
> >>>>>> PR further. So I put up this alternative proposal here:
> >>>>>> https://github.com/apache/airflow/pull/10153
> >>>>>>
> >>>>>> I have not done any UI changes due to lack of experience with web
> >> UI.
> >>>> If
> >>>>>> anyone's interested, please take a look at the PR.
> >>>>>>
> >>>>>> Qian
> >>>>>>
> >>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> >>> Jarek.Potiuk@polidea.com
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Similar point here to the other ideas that are popping up. Maybe
> >> we
> >>>>>> should
> >>>>>>> just focus on completing 2.0 and make all discussions about
> >> further
> >>>>>>> improvements to 2.1? While those are important discussions (and
> >> we
> >>>>> should
> >>>>>>> continue them in the  near future !) I think at this point
> >> focusing
> >>>> on
> >>>>>>> delivering 2.0 in its current shape should be our focus now ?
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> >>> bin.huangxb@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Daniel
> >>>>>>>>
> >>>>>>>> I agree that the TaskGroup should have the same API as a DAG
> >>> object
> >>>>>>> related
> >>>>>>>> to task dependencies, but it will not have anything related to
> >>>> actual
> >>>>>>>> execution or scheduling.
> >>>>>>>> I will update the AIP according to this over the weekend.
> >>>>>>>>
> >>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
> >> import
> >>>> the
> >>>>>>> object
> >>>>>>>> you can import it with parameters to determine the shape of the
> >>>> DAG.
> >>>>>>>>
> >>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
> >>>> purpose
> >>>>>> as
> >>>>>>> a
> >>>>>>>> DAG factory function?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> >>>>>>> daniel.imberman@gmail.com
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Bin,
> >>>>>>>>>
> >>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
> >>> the
> >>>>>>> bitwise
> >>>>>>>>> operator fro task dependencies). We could even make a
> >>>> “DAGTemplate”
> >>>>>>>> object
> >>>>>>>>> s.t. when you import the object you can import it with
> >>> parameters
> >>>>> to
> >>>>>>>>> determine the shape of the DAG.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> >>>>> bin.huangxb@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>> The TaskGroup will not take schedule interval as a parameter
> >>>>> itself,
> >>>>>>> and
> >>>>>>>> it
> >>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
> >>>>> TaskGroup
> >>>>>>>> will
> >>>>>>>>> only contain a group of tasks with interdependencies, and the
> >>>>>> TaskGroup
> >>>>>>>>> behaves like a task. It doesn't contain any
> >>> execution/scheduling
> >>>>>> logic
> >>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
> >>> like
> >>>> a
> >>>>>> DAG
> >>>>>>>>> does.
> >>>>>>>>>
> >>>>>>>>>> For example, there is the scenario that the schedule
> >> interval
> >>>> of
> >>>>>> DAG
> >>>>>>> is
> >>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
> >>>>>>>>>
> >>>>>>>>> I am curious why you ask this. Is this a use case that you
> >> want
> >>>> to
> >>>>>>>> achieve?
> >>>>>>>>>
> >>>>>>>>> Bin
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> >> thanosxnicholas@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Bin,
> >>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
> >>> same
> >>>>> as
> >>>>>>> the
> >>>>>>>>>> parent DAG? My main concern is whether the schedule
> >> interval
> >>> of
> >>>>>>>> TaskGroup
> >>>>>>>>>> could be different with that of the DAG? For example, there
> >>> is
> >>>>> the
> >>>>>>>>> scenario
> >>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> >> schedule
> >>>>>> interval
> >>>>>>>> of
> >>>>>>>>>> TaskGroup is 20 min.
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Nicholas
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> >>>>>> bin.huangxb@gmail.com
> >>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Nicholas,
> >>>>>>>>>>>
> >>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
> >>> maybe
> >>>>> it
> >>>>>>> will
> >>>>>>>>>> throw
> >>>>>>>>>>> an error? But in the original proposal, the subdag's
> >>>>>>>> schedule_interval
> >>>>>>>>>> will
> >>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
> >>>> SubDag,
> >>>>>>> there
> >>>>>>>>>> will
> >>>>>>>>>>> be no subdag schedule_interval.
> >>>>>>>>>>>
> >>>>>>>>>>> Bin
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> >>>> thanosxnicholas@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>> Thanks for your good proposal. I was confused whether
> >> the
> >>>>>>> schedule
> >>>>>>>>>>>> interval of SubDAG is different from that of the parent
> >>>> DAG?
> >>>>> I
> >>>>>>> have
> >>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
> >>> of
> >>>>>>> SubDAG.
> >>>>>>>> If
> >>>>>>>>>> the
> >>>>>>>>>>>> SubDagOperator has a different schedule interval, what
> >>> will
> >>>>>>> happen
> >>>>>>>>> for
> >>>>>>>>>>> the
> >>>>>>>>>>>> scheduler to schedule the parent DAG?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Nicholas Jiang
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> >>>>>>>> bin.huangxb@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have rethought about the concept of subdag and task
> >>>>>> groups. I
> >>>>>>>>> think
> >>>>>>>>>>> the
> >>>>>>>>>>>>> better way to approach this is to entirely remove
> >>> subdag
> >>>>> and
> >>>>>>>>>> introduce
> >>>>>>>>>>>> the
> >>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
> >>> along
> >>>>>> with
> >>>>>>>>> their
> >>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
> >>>> DAG*.
> >>>>>> The
> >>>>>>>>> only
> >>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
> >>> still
> >>>>> need
> >>>>>>> to
> >>>>>>>>> add
> >>>>>>>>>> it
> >>>>>>>>>>>> to
> >>>>>>>>>>>>> a DAG for execution.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here is a small code snippet.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ```
> >>>>>>>>>>>>> class TaskGroup:
> >>>>>>>>>>>>> """
> >>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If default_args is missing, it will take default args
> >>>> from
> >>>>>> the
> >>>>>>>>>> DAG.
> >>>>>>>>>>>>> """
> >>>>>>>>>>>>> def __init__(self, group_id, default_args):
> >>>>>>>>>>>>> pass
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> """
> >>>>>>>>>>>>> You can add tasks to a task group similar to adding
> >>> tasks
> >>>>> to
> >>>>>> a
> >>>>>>>> DAG
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This can be declared in a separate file from the dag
> >>> file
> >>>>>>>>>>>>> """
> >>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> >>>>>>>>>>>> default_args=default_args)
> >>>>>>>>>>>>> download_group.add_task(task1)
> >>>>>>>>>>>>> task2.dag = download_group
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> with download_group:
> >>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [task, task2] >> task3
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> """Add it to a DAG for execution"""
> >>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> >>>>>>> default_args=default_args,
> >>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> >>>>>>>>>>>>> start = DummyOperator(task_id='start')
> >>>>>>>>>>>>> start >> download_group
> >>>>>>>>>>>>> # this is equivalent to
> >>>>>>>>>>>>> # start >> [task, task2] >> task3
> >>>>>>>>>>>>> ```
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> With this, we can still reuse a group of tasks and
> >> set
> >>>>>>>> dependencies
> >>>>>>>>>>>> between
> >>>>>>>>>>>>> them; it avoids the boilerplate code from using
> >>>>>> SubDagOperator,
> >>>>>>>> and
> >>>>>>>>>> we
> >>>>>>>>>>>> can
> >>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> User migration wise, we can introduce it before
> >> Airflow
> >>>> 2.0
> >>>>>> and
> >>>>>>>>> allow
> >>>>>>>>>>>>> gradual transition. Then we can decide if we still
> >> want
> >>>> to
> >>>>>> keep
> >>>>>>>> the
> >>>>>>>>>>>>> SubDagOperator or simply remove it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Any thoughts?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> >>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +1, proposal looks good.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The original intention was really to have tasks
> >>> groups
> >>>>> and
> >>>>>> a
> >>>>>>>>>>>> zoom-in/out
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
> >>>>> object
> >>>>>>>> since
> >>>>>>>>> it
> >>>>>>>>>>> is
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> >>> create
> >>>>>>>> underlying
> >>>>>>>>>>>>>> confusions since a DAG is much more than just a
> >> group
> >>>> of
> >>>>>>> tasks.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Max
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> >>>>>>>>>>>>> joshipoornima06@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank you for your email.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> >>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> >>>>>> rewrites
> >>>>>>>> the
> >>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> >> it
> >>>>> will
> >>>>>>>> give a
> >>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The serialized_dag representation already
> >> does
> >>>>> this I
> >>>>>>>>> think.
> >>>>>>>>>> At
> >>>>>>>>>>>>> least
> >>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I am not sure about serialized_dag
> >>> representation,
> >>>>> but
> >>>>>> at
> >>>>>>>>> least
> >>>>>>>>>>> it
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> >> In
> >>> my
> >>>>>>>> proposal
> >>>>>>>>> as
> >>>>>>>>>>>> also
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> >> from
> >>>> the
> >>>>>>> subdag
> >>>>>>>>> and
> >>>>>>>>>>> add
> >>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
> >>>> will
> >>>>>> look
> >>>>>>>>>> exactly
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> same as without subdag but with metadata
> >> attached
> >>>> to
> >>>>>>> those
> >>>>>>>>>>>> sections.
> >>>>>>>>>>>>>>> These
> >>>>>>>>>>>>>>>> metadata will be later on used to render in the
> >>> UI.
> >>>>> So
> >>>>>>>> after
> >>>>>>>>>>>> parsing
> >>>>>>>>>>>>> (
> >>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> >> the
> >>>>>>> *root_dag
> >>>>>>>>>>>> *instead
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> *root_dag +
> >>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> >>>>>>>>>> current_group=section-1,
> >>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> >>> naming
> >>>>>>>>>>> suggestions),
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> reason for parent_group is that we can have
> >>> nested
> >>>>>> group
> >>>>>>>> and
> >>>>>>>>>>>> still
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> able to capture the dependency.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Runtime DAG:
> >>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> While at the UI, what we see would be something
> >>>> like
> >>>>>> this
> >>>>>>>> by
> >>>>>>>>>>>>> utilizing
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> >> in
> >>>> some
> >>>>>>> way.
> >>>>>>>>>>>>>>>> [image: image.png]
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The benefits I can see is that:
> >>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> >>> complexity
> >>>> of
> >>>>>>>> SubDag
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> >> using
> >>>>>> SubDag.
> >>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> >>>>> reusable
> >>>>>>> dag
> >>>>>>>>> code
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> declare dependencies between them. And with the
> >>> new
> >>>>>>>>>>> SubDagOperator
> >>>>>>>>>>>>> (see
> >>>>>>>>>>>>>>> AIP
> >>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> >>>>> function
> >>>>>>> for
> >>>>>>>>>>>>> generating 1
> >>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> >>> (in
> >>>>> this
> >>>>>>>> case,
> >>>>>>>>>> it
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> >>> root
> >>>>>> dag).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> >>>> with a
> >>>>>>>>>> simpler
> >>>>>>>>>>>>>> concept
> >>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> >> out
> >>>> the
> >>>>>>>>>> contents
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> SubDag
> >>>>>>>>>>>>>>>> and becomes more like
> >>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> >>>>>>>>>>>>>>> (forgive
> >>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
> >>>> still
> >>>>>>>>>>> necessary
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> keep the
> >>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> >>>> name?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> >>>> Chris
> >>>>>>> Palmer
> >>>>>>>>> for
> >>>>>>>>>>>>> helping
> >>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
> >>>> will
> >>>>>> just
> >>>>>>>>> paste
> >>>>>>>>>>> it
> >>>>>>>>>>>>>> here.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> >> in
> >>>> the
> >>>>>> same
> >>>>>>>>>>>> TaskGroup,
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> >> a
> >>>>>>> TaskGroup
> >>>>>>>>>> and
> >>>>>>>>>>>>>> either a
> >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> >> in
> >>>> any
> >>>>>>> group
> >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> >>> TaskGroup
> >>>>> and
> >>>>>>>>>> either
> >>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> >> as
> >>> a
> >>>>>> single
> >>>>>>>>>>>> "object",
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> >>>>> "status"
> >>>>>>> of a
> >>>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> at least for UI display purposes
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I agree with Chris:
> >>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> >>> executor), I
> >>>>>> think
> >>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> >> to
> >>>>>>> implement
> >>>>>>>>>> some
> >>>>>>>>>>>>>> metadata
> >>>>>>>>>>>>>>>> operations that allows start/stop a group of
> >>> tasks
> >>>>>> etc.)
> >>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
> >>> up
> >>>>> the
> >>>>>>>>>> individual
> >>>>>>>>>>>>>> tasks'
> >>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> >> status
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> >> Imberman
> >>> <
> >>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
> >>> to
> >>>>> tie
> >>>>>>> dags
> >>>>>>>>>>>> together
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
> >>>> could
> >>>>>>>>>> essentially
> >>>>>>>>>>>>> write
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> the ability to set dependencies to all
> >>>> starter-tasks
> >>>>>> for
> >>>>>>>>> that
> >>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> >> UI
> >>>>>> concept.
> >>>>>>>> It
> >>>>>>>>>>>> doesn’t
> >>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
> >>>> tasks
> >>>>>> to
> >>>>>>>> the
> >>>>>>>>>>> queue
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>> be executed when there are resources
> >> available.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> via Newton Mail [
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> >>>>>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> >> <
> >>>>>>>>>>> chris@crpalmer.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> >>>>>> abstraction.
> >>>>>>> I
> >>>>>>>>>> think
> >>>>>>>>>>>> what
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> >> high
> >>>>> level
> >>>>>> I
> >>>>>>>>> think
> >>>>>>>>>>> you
> >>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> functionality:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> >>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
> >>> the
> >>>>>> same
> >>>>>>>>>>> TaskGroup,
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
> >>>>>> TaskGroup
> >>>>>>>> and
> >>>>>>>>>>>> either
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
> >>> any
> >>>>>> group
> >>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> >>> TaskGroup
> >>>>> and
> >>>>>>>> either
> >>>>>>>>>>> other
> >>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> >>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> >> as a
> >>>>>> single
> >>>>>>>>>>> "object",
> >>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> >>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> >>>> "status"
> >>>>>> of
> >>>>>>> a
> >>>>>>>>>>>> TaskGroup
> >>>>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> at least for UI display purposes
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> >>> object
> >>>>>> with
> >>>>>>>> its
> >>>>>>>>>> own
> >>>>>>>>>>>>>> database
> >>>>>>>>>>>>>>>>> table and model or just another attribute on
> >>>> tasks.
> >>>>> I
> >>>>>>>> think
> >>>>>>>>>> you
> >>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> >> point
> >>> of
> >>>>>> view
> >>>>>>> a
> >>>>>>>>> DAG
> >>>>>>>>>>> with
> >>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> >> differently.
> >>> So
> >>>>> it
> >>>>>>>> really
> >>>>>>>>>>> just
> >>>>>>>>>>>>>>> becomes
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
> >>> of
> >>>>>> Tasks,
> >>>>>>>> and
> >>>>>>>>>>>> allows
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Chris
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> >>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> >> the
> >>>> more
> >>>>>>>>> important
> >>>>>>>>>>>> issue
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> fix),
> >>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> >>> right
> >>>>> way
> >>>>>>>>> forward
> >>>>>>>>>>>> (just
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> >>> adding
> >>>>>>> visual
> >>>>>>>>>>> grouping
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> UI).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> >>> with
> >>>>> more
> >>>>>>>>> context
> >>>>>>>>>>> on
> >>>>>>>>>>>>> why
> >>>>>>>>>>>>>>>>> subdags
> >>>>>>>>>>>>>>>>>> are bad and potential solutions:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> >>>>>>>>>>>>>> . A
> >>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> >> is
> >>>> e.g.
> >>>>>>>>> enabling
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> operator
> >>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> >>>> well. I
> >>>>>> see
> >>>>>>>>> this
> >>>>>>>>>>>> being
> >>>>>>>>>>>>>>>>> separate
> >>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> >> UI
> >>>> but
> >>>>>> one
> >>>>>>> of
> >>>>>>>>> the
> >>>>>>>>>>> two
> >>>>>>>>>>>>>> items
> >>>>>>>>>>>>>>>>>> required to replace all existing subdag
> >>>>>> functionality.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> >> and
> >>>>> they
> >>>>>>> are
> >>>>>>>>>>> always a
> >>>>>>>>>>>>>> giant
> >>>>>>>>>>>>>>>>> pain
> >>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> >>>>> confusion
> >>>>>>> and
> >>>>>>>>>>>> breakages
> >>>>>>>>>>>>>>>>> during
> >>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> >> Coder <
> >>>>>>>>>>>> jcoder01@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> >> UI
> >>>>>>> concept. I
> >>>>>>>>> use
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> >>> you
> >>>>>> have a
> >>>>>>>>> group
> >>>>>>>>>>> of
> >>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> need to finish before another group of
> >> tasks
> >>>>>> start,
> >>>>>>>>> using
> >>>>>>>>>> a
> >>>>>>>>>>>>> subdag
> >>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> >>>> and I
> >>>>>>> think
> >>>>>>>>>> also
> >>>>>>>>>>>> make
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> easier
> >>>>>>>>>>>>>>>>>>> to follow the dag code.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> >> Hamlin
> >>> <
> >>>>>>>>>>>>> hamlin.kn@gmail.com>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> >>>>>> Berlin-Taylor
> >>>>>>> <
> >>>>>>>>>>>>>> ash@apache.org
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Question:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> >>>> anymore?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> >>>>> replacing
> >>>>>> it
> >>>>>>>>> with
> >>>>>>>>>> a
> >>>>>>>>>>> UI
> >>>>>>>>>>>>>>>>> grouping
> >>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> >> to
> >>>> get
> >>>>>>>> wrong,
> >>>>>>>>>> and
> >>>>>>>>>>>>> closer
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> >>>> subdags?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> >>>> subdags
> >>>>>>> could
> >>>>>>>>>> start
> >>>>>>>>>>>>>> running
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> >> we
> >>>> not
> >>>>>>> also
> >>>>>>>>> just
> >>>>>>>>>>>>>>> _enitrely_
> >>>>>>>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> >> it
> >>>> with
> >>>>>>>>> something
> >>>>>>>>>>>>>> simpler.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> >>> haven't
> >>>>> used
> >>>>>>>> them
> >>>>>>>>>>>>>> extensively
> >>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> >>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> >>>> has(?)
> >>>>> to
> >>>>>>> be
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>> form
> >>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> >>>>>>>>>>>>>>>>>>>>> - They need their own
> >> schedule_interval,
> >>>> but
> >>>>>> it
> >>>>>>>> has
> >>>>>>>>> to
> >>>>>>>>>>>> match
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> parent
> >>>>>>>>>>>>>>>>>>>> dag
> >>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> >>>> (Does
> >>>>>> it
> >>>>>>>> make
> >>>>>>>>>>> sense
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>> this?
> >>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> >>> sub
> >>>>> dag
> >>>>>>>> would
> >>>>>>>>>>> never
> >>>>>>>>>>>>>>>>> execute, so
> >>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> >>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> >>>>> operator a
> >>>>>>>>> subdag
> >>>>>>>>>>> with
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Thoughts?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> -ash
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> >>>>>> Berlin-Taylor <
> >>>>>>>>>>>>>> ash@apache.org>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> >>>>> excited
> >>>>>> to
> >>>>>>>> see
> >>>>>>>>>> how
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>> progresses.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> >>> parsing*:
> >>>>> This
> >>>>>>>>>> rewrites
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> >>> parsing,
> >>>>> and
> >>>>>> it
> >>>>>>>>> will
> >>>>>>>>>>>> give a
> >>>>>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> >>>> already
> >>>>>> does
> >>>>>>>>> this
> >>>>>>>>>> I
> >>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>> At
> >>>>>>>>>>>>>>>>>> least
> >>>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> >>>> correctly.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -ash
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> >>>> Huang <
> >>>>>>>>>>>>>>> bin.huangxb@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> >>>> collect
> >>>>>>>>> feedback
> >>>>>>>>>> on
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> AIP-34
> >>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> >>>>>> previously
> >>>>>>>>>> briefly
> >>>>>>>>>>>>>>>>> mentioned in
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> >>> done
> >>>>> for
> >>>>>>>>> Airflow
> >>>>>>>>>>> 2.0,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> one of
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> >>> attach
> >>>>>> tasks
> >>>>>>>> back
> >>>>>>>>>> to
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> root
> >>>>>>>>>>>>>>>>> DAG.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> >>>>>> SubDagOperator
> >>>>>>>>>> related
> >>>>>>>>>>>>>> issues
> >>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>>>> reattaching
> >>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> >> while
> >>>>>>> respecting
> >>>>>>>>>>>>>> dependencies
> >>>>>>>>>>>>>>>>>> during
> >>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> >> effect
> >>>> on
> >>>>>> the
> >>>>>>> UI
> >>>>>>>>>> will
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>> achieved
> >>>>>>>>>>>>>>>>>>>> through
> >>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> >>>> function
> >>>>>> more
> >>>>>>>>>>> reusable
> >>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> >>>>>>> child_dag_name
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> function
> >>>>>>>>>>>>>>>>>>>>> signature
> >>>>>>>>>>>>>>>>>>>>>>> anymore.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> >>> parsing*:
> >>>>> This
> >>>>>>>>>> rewrites
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> >>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> >>> parsing,
> >>>>> and
> >>>>>> it
> >>>>>>>>> will
> >>>>>>>>>>>> give a
> >>>>>>>>>>>>>>> flat
> >>>>>>>>>>>>>>>>>>>>>>> structure at
> >>>>>>>>>>>>>>>>>>>>>>> the task level
> >>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> >> new
> >>>>>>>>> SubDagOperator
> >>>>>>>>>>>> acts
> >>>>>>>>>>>>>>> like a
> >>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> >>>>> methods
> >>>>>>> are
> >>>>>>>>>>> removed.
> >>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>> signature is
> >>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> >> *with
> >>>>>>>>> *subdag_args
> >>>>>>>>>>> *and
> >>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> >>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> >> PythonOperator
> >>>>>>>> signature.
> >>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> >>>>>>> current_group
> >>>>>>>> &
> >>>>>>>>>>>>>> parent_group
> >>>>>>>>>>>>>>>>>>>>> attributes
> >>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> >>> used
> >>>>> to
> >>>>>>>> group
> >>>>>>>>>>> tasks
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>> rendering at
> >>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> >>>>> further
> >>>>>>> to
> >>>>>>>>>> group
> >>>>>>>>>>>>>>> arbitrary
> >>>>>>>>>>>>>>>>>>> tasks
> >>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> >>> allow
> >>>>>>>>> group-level
> >>>>>>>>>>>>>> operations
> >>>>>>>>>>>>>>>>>>> (i.e.
> >>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> >>> the
> >>>>>> dag)
> >>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> >> Proposed
> >>>> UI
> >>>>>>>>>> modification
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> >>>> flat
> >>>>>>>>> structure
> >>>>>>>>>> to
> >>>>>>>>>>>>> pair
> >>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> first
> >>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> >>>>> hierarchical
> >>>>>>>>>>> structure.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> >> PRs
> >>>> for
> >>>>>>>> details:
> >>>>>>>>>>>>>>>>>>>>>>> AIP:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> >>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> >>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> >>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> >>>>> aspects
> >>>>>>>> that
> >>>>>>>>>> you
> >>>>>>>>>>>>>>>>>> agree/disagree
> >>>>>>>>>>>>>>>>>>>>>>> with or
> >>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> >>> the
> >>>>>> third
> >>>>>>>>>> change
> >>>>>>>>>>>>>>> regarding
> >>>>>>>>>>>>>>>>>>>>> TaskGroup).
> >>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> >>>> looking
> >>>>>>>> forward
> >>>>>>>>>> to
> >>>>>>>>>>>> it!
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>> Bin
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Thanks & Regards
> >>>>>>>>>>>>>>> Poornima
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Jarek Potiuk
> >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>>
> >>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> >>>>> <+48%20660%20796%20129>>
> >>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Jarek Potiuk
> >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>
> >>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> >>>>> <+48%20660%20796%20129>>
> >>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> *Jacob Ferriero*
> >>>>
> >>>> Strategic Cloud Engineer: Data Engineering
> >>>>
> >>>> jferriero@google.com
> >>>>
> >>>> 617-714-2509
> >>>>
> >>>
> >>
>
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by James Coder <jc...@gmail.com>.
I agree this looks great, one question, how does the tree view look?

James Coder

> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <gc...@twitter.com.invalid> wrote:
> 
> First of all, this is awesome!!
> 
> Secondly, checking your UI code, seems you are loading all operators at
> once. Wondering if we can load them as needed (aka load whenever we click
> the TaskGroup). Some of our DAGs are so large that take forever to load on
> the Graph view, so worried about this still being an issue here. It may be
> easily solvable by implementing lazy loading of the graph. Not sure how
> easy to implement/add to the UI extension (and dont want to push for early
> optimization as its the root of all evil).
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> 
> 
>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bi...@gmail.com> wrote:
>> 
>> Hi Yu,
>> 
>> Thank you so much for taking on this. I was fairly distracted previously
>> and I didn't have the time to update the proposal. In fact, after
>> discussing with Ash, Kaxil and Daniel, the direction of this AIP has been
>> changed to favor the concept of TaskGroup instead of rewriting
>> SubDagOperator (though it may may sense to deprecate SubDag in a future
>> date.).
>> 
>> Your PR is amazing and it has implemented the desire features. I think we
>> can focus on your new PR instead. Do you mind updating the AIP based on
>> what you have done in your PR?
>> 
>> Best,
>> Bin
>> 
>> 
>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com> wrote:
>>> 
>>> Hi, all, I've added the basic UI changes to my proposed implementation of
>>> TaskGroup as UI grouping concept:
>>> https://github.com/apache/airflow/pull/10153
>>> 
>>> I think Chris had a pretty good specification of TaskGroup so i'm quoting
>>> it here. The only thing I don't fully agree with is the restriction
>>> "... **cannot*
>>> have dependencies between a Task in a TaskGroup and either a*
>>> *   Task in a different TaskGroup or a Task not in any group*". I think
>>> this is over restrictive. Since TaskGroup is a UI concept, tasks can have
>>> dependencies on tasks in other TaskGroup or not in any TaskGroup. In my
>> PR,
>>> this is allowed. The graph edges will update accordingly when TaskGroups
>>> are expanded/collapsed. TaskGroup is only helping to make the UI look
>> less
>>> crowded. Under the hood, everything is still a DAG of tasks and edges so
>>> things work normally. Here's a screenshot
>>> <
>>> 
>> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
>>>> 
>>> of the UI interaction.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *   - Tasks can be added to a TaskGroup   - You *can* have dependencies
>>> between Tasks in the same TaskGroup, but   *cannot* have dependencies
>>> between a Task in a TaskGroup and either a   Task in a different
>> TaskGroup
>>> or a Task not in any group   - You *can* have dependencies between a
>>> TaskGroup and either other   TaskGroups or Tasks not in any group   - The
>>> UI will by default render a TaskGroup as a single "object", but   which
>> you
>>> expand or zoom into in some way   - You'd need some way to determine what
>>> the "status" of a TaskGroup was   at least for UI display purposes*
>>> 
>>> 
>>> Regarding Jake's comment, I agree it's possible to implement the
>> "retrying
>>> tasks in a group" pattern he mentioned as an optional feature of
>> TaskGroup
>>> although that may go against having TaskGroup as a pure UI concept. For
>> the
>>> motivating example Jake provided, I suggest implementing both
>>> SubmitLongRunningJobTask and PollJobStatusSensor in a single operator. It
>>> can do something like BaseSensorOperator.execute() does in "reschedule"
>>> mode, i.e. it first executes some code to submit the long running job to
>>> the external service, and store the state (e.g. in XCom). Then reschedule
>>> itself. Subsequent runs then pokes for the completion state.
>>> 
>>> 
>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
>> <jferriero@google.com.invalid
>>>> 
>>> wrote:
>>> 
>>>> I really like this idea of a TaskGroup container as I think this will
>> be
>>>> much easier to use than SubDag.
>>>> 
>>>> I'd like to propose an optional behavior for special retry mechanics
>> via
>>> a
>>>> TaskGroup.retry_all property.
>>>> This way I could use TaskGroup to replace my favorite use of SubDag for
>>>> atomically retrying tasks of the pattern "act on external state then
>>>> reschedule poll until desired state reached".
>>>> 
>>>> Motivating use case I have for a SubDag is very simple two task group
>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
>>>> I use SubDag is because it gives me an easy way to retry the
>>> SubmitJobTask
>>>> if something about the PollJobSensor fails.
>>>> This pattern would be really nice for jobs that are expected to run a
>>> long
>>>> time (because we can use sensor can use reschedule mode freeing up
>> slots)
>>>> but might fail for a retryable reason.
>>>> However, using SubDag to meet this use case defeats the purpose because
>>>> SubDag infamously
>>>> <
>>>> 
>>> 
>> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
>>>>> 
>>>> blocks a "controller" slot for the entire duration.
>>>> This may feel like a cyclic behavior but reality it is very common for
>> a
>>>> single operator to submit job / wait til done.
>>>> We could use this case refactor many operators (e.g. BQ, Dataproc,
>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with
>> an
>>>> optional reschedule mode if user knows that this job may take a long
>>> time.
>>>> 
>>>> I'd be happy to the development work on adding this specific retry
>>> behavior
>>>> to TaskGroup once the base concept is implemented if others in the
>>>> community would find this a useful feature.
>>>> 
>>>> Cheers,
>>>> Jake
>>>> 
>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
>>> 
>>>> wrote:
>>>> 
>>>>> All for it :) . I think we are getting closer to have regular
>> planning
>>>> and
>>>>> making some structured approach to 2.0 and starting task force for it
>>>> soon,
>>>>> so I think this should be perfectly fine to discuss and even start
>>>>> implementing what's beyond as soon as we make sure that we are
>>>> prioritizing
>>>>> 2.0 work.
>>>>> 
>>>>> J,
>>>>> 
>>>>> 
>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi Jarek,
>>>>>> 
>>>>>> I agree we should not change the behaviour of the existing
>>>> SubDagOperator
>>>>>> till Airflow 2.1. Is it okay to continue the discussion about
>>> TaskGroup
>>>>> as
>>>>>> a brand new concept/feature independent from the existing
>>>> SubDagOperator?
>>>>>> In other words, shall we add TaskGroup as a UI grouping concept
>> like
>>>> Ash
>>>>>> suggested, and not touch SubDagOperator atl all. Whenever we are
>>> ready
>>>>> with
>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
>>>>>> 
>>>>>> I really like Ash's idea of simplifying the SubDagOperator idea
>> into
>>> a
>>>>>> simple UI grouping concept. I think Xinbin's idea of "reattaching
>> all
>>>> the
>>>>>> tasks to the root DAG" is the way to go. And I see James pointed
>> out
>>> we
>>>>>> need some helper functions to simplify dependencies setting of
>>>> TaskGroup.
>>>>>> Xinbin put up a pretty elegant example in his PR
>>>>>> <https://github.com/apache/airflow/pull/9243>. I think having
>>>> TaskGroup
>>>>> as
>>>>>> a UI concept should be a relatively small change. We can simplify
>>>>> Xinbin's
>>>>>> PR further. So I put up this alternative proposal here:
>>>>>> https://github.com/apache/airflow/pull/10153
>>>>>> 
>>>>>> I have not done any UI changes due to lack of experience with web
>> UI.
>>>> If
>>>>>> anyone's interested, please take a look at the PR.
>>>>>> 
>>>>>> Qian
>>>>>> 
>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
>>> Jarek.Potiuk@polidea.com
>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Similar point here to the other ideas that are popping up. Maybe
>> we
>>>>>> should
>>>>>>> just focus on completing 2.0 and make all discussions about
>> further
>>>>>>> improvements to 2.1? While those are important discussions (and
>> we
>>>>> should
>>>>>>> continue them in the  near future !) I think at this point
>> focusing
>>>> on
>>>>>>> delivering 2.0 in its current shape should be our focus now ?
>>>>>>> 
>>>>>>> J.
>>>>>>> 
>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
>>> bin.huangxb@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Daniel
>>>>>>>> 
>>>>>>>> I agree that the TaskGroup should have the same API as a DAG
>>> object
>>>>>>> related
>>>>>>>> to task dependencies, but it will not have anything related to
>>>> actual
>>>>>>>> execution or scheduling.
>>>>>>>> I will update the AIP according to this over the weekend.
>>>>>>>> 
>>>>>>>>> We could even make a “DAGTemplate” object s.t. when you
>> import
>>>> the
>>>>>>> object
>>>>>>>> you can import it with parameters to determine the shape of the
>>>> DAG.
>>>>>>>> 
>>>>>>>> Can you elaborate a bit more on this? Does it serve a similar
>>>> purpose
>>>>>> as
>>>>>>> a
>>>>>>>> DAG factory function?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
>>>>>>> daniel.imberman@gmail.com
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Bin,
>>>>>>>>> 
>>>>>>>>> Why not give the TaskGroup the same API as a DAG object (e.g.
>>> the
>>>>>>> bitwise
>>>>>>>>> operator fro task dependencies). We could even make a
>>>> “DAGTemplate”
>>>>>>>> object
>>>>>>>>> s.t. when you import the object you can import it with
>>> parameters
>>>>> to
>>>>>>>>> determine the shape of the DAG.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
>>>>> bin.huangxb@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> The TaskGroup will not take schedule interval as a parameter
>>>>> itself,
>>>>>>> and
>>>>>>>> it
>>>>>>>>> depends on the DAG where it attaches to. In my opinion, the
>>>>> TaskGroup
>>>>>>>> will
>>>>>>>>> only contain a group of tasks with interdependencies, and the
>>>>>> TaskGroup
>>>>>>>>> behaves like a task. It doesn't contain any
>>> execution/scheduling
>>>>>> logic
>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs etc.)
>>> like
>>>> a
>>>>>> DAG
>>>>>>>>> does.
>>>>>>>>> 
>>>>>>>>>> For example, there is the scenario that the schedule
>> interval
>>>> of
>>>>>> DAG
>>>>>>> is
>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20 min.
>>>>>>>>> 
>>>>>>>>> I am curious why you ask this. Is this a use case that you
>> want
>>>> to
>>>>>>>> achieve?
>>>>>>>>> 
>>>>>>>>> Bin
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
>> thanosxnicholas@gmail.com
>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Bin,
>>>>>>>>>> Using TaskGroup, Is the schedule interval of TaskGroup the
>>> same
>>>>> as
>>>>>>> the
>>>>>>>>>> parent DAG? My main concern is whether the schedule
>> interval
>>> of
>>>>>>>> TaskGroup
>>>>>>>>>> could be different with that of the DAG? For example, there
>>> is
>>>>> the
>>>>>>>>> scenario
>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
>> schedule
>>>>>> interval
>>>>>>>> of
>>>>>>>>>> TaskGroup is 20 min.
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Nicholas
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
>>>>>> bin.huangxb@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Nicholas,
>>>>>>>>>>> 
>>>>>>>>>>> I am not sure about the old behavior of SubDagOperator,
>>> maybe
>>>>> it
>>>>>>> will
>>>>>>>>>> throw
>>>>>>>>>>> an error? But in the original proposal, the subdag's
>>>>>>>> schedule_interval
>>>>>>>>>> will
>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to replace
>>>> SubDag,
>>>>>>> there
>>>>>>>>>> will
>>>>>>>>>>> be no subdag schedule_interval.
>>>>>>>>>>> 
>>>>>>>>>>> Bin
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
>>>> thanosxnicholas@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>> Thanks for your good proposal. I was confused whether
>> the
>>>>>>> schedule
>>>>>>>>>>>> interval of SubDAG is different from that of the parent
>>>> DAG?
>>>>> I
>>>>>>> have
>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule interval
>>> of
>>>>>>> SubDAG.
>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>>>> SubDagOperator has a different schedule interval, what
>>> will
>>>>>>> happen
>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>>> scheduler to schedule the parent DAG?
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Nicholas Jiang
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
>>>>>>>> bin.huangxb@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have rethought about the concept of subdag and task
>>>>>> groups. I
>>>>>>>>> think
>>>>>>>>>>> the
>>>>>>>>>>>>> better way to approach this is to entirely remove
>>> subdag
>>>>> and
>>>>>>>>>> introduce
>>>>>>>>>>>> the
>>>>>>>>>>>>> concept of TaskGroup, which is a container of tasks
>>> along
>>>>>> with
>>>>>>>>> their
>>>>>>>>>>>>> dependencies *without execution/scheduling logic as a
>>>> DAG*.
>>>>>> The
>>>>>>>>> only
>>>>>>>>>>>>> purpose of it is to group a list of tasks, but you
>>> still
>>>>> need
>>>>>>> to
>>>>>>>>> add
>>>>>>>>>> it
>>>>>>>>>>>> to
>>>>>>>>>>>>> a DAG for execution.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here is a small code snippet.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> class TaskGroup:
>>>>>>>>>>>>> """
>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If default_args is missing, it will take default args
>>>> from
>>>>>> the
>>>>>>>>>> DAG.
>>>>>>>>>>>>> """
>>>>>>>>>>>>> def __init__(self, group_id, default_args):
>>>>>>>>>>>>> pass
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> """
>>>>>>>>>>>>> You can add tasks to a task group similar to adding
>>> tasks
>>>>> to
>>>>>> a
>>>>>>>> DAG
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This can be declared in a separate file from the dag
>>> file
>>>>>>>>>>>>> """
>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
>>>>>>>>>>>> default_args=default_args)
>>>>>>>>>>>>> download_group.add_task(task1)
>>>>>>>>>>>>> task2.dag = download_group
>>>>>>>>>>>>> 
>>>>>>>>>>>>> with download_group:
>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [task, task2] >> task3
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> """Add it to a DAG for execution"""
>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
>>>>>>> default_args=default_args,
>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
>>>>>>>>>>>>> start = DummyOperator(task_id='start')
>>>>>>>>>>>>> start >> download_group
>>>>>>>>>>>>> # this is equivalent to
>>>>>>>>>>>>> # start >> [task, task2] >> task3
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With this, we can still reuse a group of tasks and
>> set
>>>>>>>> dependencies
>>>>>>>>>>>> between
>>>>>>>>>>>>> them; it avoids the boilerplate code from using
>>>>>> SubDagOperator,
>>>>>>>> and
>>>>>>>>>> we
>>>>>>>>>>>> can
>>>>>>>>>>>>> declare dependencies as `task >> task_group >> task`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> User migration wise, we can introduce it before
>> Airflow
>>>> 2.0
>>>>>> and
>>>>>>>>> allow
>>>>>>>>>>>>> gradual transition. Then we can decide if we still
>> want
>>>> to
>>>>>> keep
>>>>>>>> the
>>>>>>>>>>>>> SubDagOperator or simply remove it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any thoughts?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Bin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1, proposal looks good.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The original intention was really to have tasks
>>> groups
>>>>> and
>>>>>> a
>>>>>>>>>>>> zoom-in/out
>>>>>>>>>>>>> in
>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the DAG
>>>>> object
>>>>>>>> since
>>>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>>> a
>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
>>> create
>>>>>>>> underlying
>>>>>>>>>>>>>> confusions since a DAG is much more than just a
>> group
>>>> of
>>>>>>> tasks.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Max
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
>>>>>>>>>>>>> joshipoornima06@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you for your email.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
>>>>>> rewrites
>>>>>>>> the
>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
>> it
>>>>> will
>>>>>>>> give a
>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The serialized_dag representation already
>> does
>>>>> this I
>>>>>>>>> think.
>>>>>>>>>> At
>>>>>>>>>>>>> least
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I am not sure about serialized_dag
>>> representation,
>>>>> but
>>>>>> at
>>>>>>>>> least
>>>>>>>>>>> it
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
>> In
>>> my
>>>>>>>> proposal
>>>>>>>>> as
>>>>>>>>>>>> also
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
>> from
>>>> the
>>>>>>> subdag
>>>>>>>>> and
>>>>>>>>>>> add
>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG graph
>>>> will
>>>>>> look
>>>>>>>>>> exactly
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same as without subdag but with metadata
>> attached
>>>> to
>>>>>>> those
>>>>>>>>>>>> sections.
>>>>>>>>>>>>>>> These
>>>>>>>>>>>>>>>> metadata will be later on used to render in the
>>> UI.
>>>>> So
>>>>>>>> after
>>>>>>>>>>>> parsing
>>>>>>>>>>>>> (
>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
>> the
>>>>>>> *root_dag
>>>>>>>>>>>> *instead
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> *root_dag +
>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
>>>>>>>>>> current_group=section-1,
>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
>>> naming
>>>>>>>>>>> suggestions),
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> reason for parent_group is that we can have
>>> nested
>>>>>> group
>>>>>>>> and
>>>>>>>>>>>> still
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> able to capture the dependency.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Runtime DAG:
>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> While at the UI, what we see would be something
>>>> like
>>>>>> this
>>>>>>>> by
>>>>>>>>>>>>> utilizing
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
>> in
>>>> some
>>>>>>> way.
>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The benefits I can see is that:
>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
>>> complexity
>>>> of
>>>>>>>> SubDag
>>>>>>>>>> for
>>>>>>>>>>>>>>> execution
>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
>> using
>>>>>> SubDag.
>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
>>>>> reusable
>>>>>>> dag
>>>>>>>>> code
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> declare dependencies between them. And with the
>>> new
>>>>>>>>>>> SubDagOperator
>>>>>>>>>>>>> (see
>>>>>>>>>>>>>>> AIP
>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
>>>>> function
>>>>>>> for
>>>>>>>>>>>>> generating 1
>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
>>> (in
>>>>> this
>>>>>>>> case,
>>>>>>>>>> it
>>>>>>>>>>>> will
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
>>> root
>>>>>> dag).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
>>>> with a
>>>>>>>>>> simpler
>>>>>>>>>>>>>> concept
>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
>> out
>>>> the
>>>>>>>>>> contents
>>>>>>>>>>>> of
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> SubDag
>>>>>>>>>>>>>>>> and becomes more like
>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
>>>>>>>>>>>>>>> (forgive
>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it is
>>>> still
>>>>>>>>>>> necessary
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> keep the
>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
>>>> name?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
>>>> Chris
>>>>>>> Palmer
>>>>>>>>> for
>>>>>>>>>>>>> helping
>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup, I
>>>> will
>>>>>> just
>>>>>>>>> paste
>>>>>>>>>>> it
>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
>> in
>>>> the
>>>>>> same
>>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
>> a
>>>>>>> TaskGroup
>>>>>>>>>> and
>>>>>>>>>>>>>> either a
>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
>> in
>>>> any
>>>>>>> group
>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>> TaskGroup
>>>>> and
>>>>>>>>>> either
>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>> as
>>> a
>>>>>> single
>>>>>>>>>>>> "object",
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>>> "status"
>>>>>>> of a
>>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I agree with Chris:
>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
>>> executor), I
>>>>>> think
>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
>> to
>>>>>>> implement
>>>>>>>>>> some
>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>>> operations that allows start/stop a group of
>>> tasks
>>>>>> etc.)
>>>>>>>>>>>>>>>> - From the UI's View, it should be able to pick
>>> up
>>>>> the
>>>>>>>>>> individual
>>>>>>>>>>>>>> tasks'
>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
>> status
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
>> Imberman
>>> <
>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>` operator
>>> to
>>>>> tie
>>>>>>> dags
>>>>>>>>>>>> together
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if we
>>>> could
>>>>>>>>>> essentially
>>>>>>>>>>>>> write
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> the ability to set dependencies to all
>>>> starter-tasks
>>>>>> for
>>>>>>>>> that
>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
>> UI
>>>>>> concept.
>>>>>>>> It
>>>>>>>>>>>> doesn’t
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>> to execute separately, you’re just adding more
>>>> tasks
>>>>>> to
>>>>>>>> the
>>>>>>>>>>> queue
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>> be executed when there are resources
>> available.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> via Newton Mail [
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
>> <
>>>>>>>>>>> chris@crpalmer.com
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
>>>>>> abstraction.
>>>>>>> I
>>>>>>>>>> think
>>>>>>>>>>>> what
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
>> high
>>>>> level
>>>>>> I
>>>>>>>>> think
>>>>>>>>>>> you
>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> functionality:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks in
>>> the
>>>>>> same
>>>>>>>>>>> TaskGroup,
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in a
>>>>>> TaskGroup
>>>>>>>> and
>>>>>>>>>>>> either
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not in
>>> any
>>>>>> group
>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
>>> TaskGroup
>>>>> and
>>>>>>>> either
>>>>>>>>>>> other
>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
>> as a
>>>>>> single
>>>>>>>>>>> "object",
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
>>>> "status"
>>>>>> of
>>>>>>> a
>>>>>>>>>>>> TaskGroup
>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> at least for UI display purposes
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
>>> object
>>>>>> with
>>>>>>>> its
>>>>>>>>>> own
>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>> table and model or just another attribute on
>>>> tasks.
>>>>> I
>>>>>>>> think
>>>>>>>>>> you
>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
>> point
>>> of
>>>>>> view
>>>>>>> a
>>>>>>>>> DAG
>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
>> differently.
>>> So
>>>>> it
>>>>>>>> really
>>>>>>>>>>> just
>>>>>>>>>>>>>>> becomes
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> shortcut for setting dependencies between sets
>>> of
>>>>>> Tasks,
>>>>>>>> and
>>>>>>>>>>>> allows
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
>> the
>>>> more
>>>>>>>>> important
>>>>>>>>>>>> issue
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> fix),
>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
>>> right
>>>>> way
>>>>>>>>> forward
>>>>>>>>>>>> (just
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
>>> adding
>>>>>>> visual
>>>>>>>>>>> grouping
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> UI).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
>>> with
>>>>> more
>>>>>>>>> context
>>>>>>>>>>> on
>>>>>>>>>>>>> why
>>>>>>>>>>>>>>>>> subdags
>>>>>>>>>>>>>>>>>> are bad and potential solutions:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>> 
>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
>>>>>>>>>>>>>> . A
>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
>> is
>>>> e.g.
>>>>>>>>> enabling
>>>>>>>>>>> the
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
>>>> well. I
>>>>>> see
>>>>>>>>> this
>>>>>>>>>>>> being
>>>>>>>>>>>>>>>>> separate
>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
>> UI
>>>> but
>>>>>> one
>>>>>>> of
>>>>>>>>> the
>>>>>>>>>>> two
>>>>>>>>>>>>>> items
>>>>>>>>>>>>>>>>>> required to replace all existing subdag
>>>>>> functionality.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
>> and
>>>>> they
>>>>>>> are
>>>>>>>>>>> always a
>>>>>>>>>>>>>> giant
>>>>>>>>>>>>>>>>> pain
>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
>>>>> confusion
>>>>>>> and
>>>>>>>>>>>> breakages
>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
>> Coder <
>>>>>>>>>>>> jcoder01@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
>> UI
>>>>>>> concept. I
>>>>>>>>> use
>>>>>>>>>>> the
>>>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
>>> you
>>>>>> have a
>>>>>>>>> group
>>>>>>>>>>> of
>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> need to finish before another group of
>> tasks
>>>>>> start,
>>>>>>>>> using
>>>>>>>>>> a
>>>>>>>>>>>>> subdag
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
>>>> and I
>>>>>>> think
>>>>>>>>>> also
>>>>>>>>>>>> make
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> easier
>>>>>>>>>>>>>>>>>>> to follow the dag code.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
>> Hamlin
>>> <
>>>>>>>>>>>>> hamlin.kn@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
>>>>>> Berlin-Taylor
>>>>>>> <
>>>>>>>>>>>>>> ash@apache.org
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Question:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
>>>> anymore?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
>>>>> replacing
>>>>>> it
>>>>>>>>> with
>>>>>>>>>> a
>>>>>>>>>>> UI
>>>>>>>>>>>>>>>>> grouping
>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
>> to
>>>> get
>>>>>>>> wrong,
>>>>>>>>>> and
>>>>>>>>>>>>> closer
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
>>>> subdags?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
>>>> subdags
>>>>>>> could
>>>>>>>>>> start
>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
>> we
>>>> not
>>>>>>> also
>>>>>>>>> just
>>>>>>>>>>>>>>> _enitrely_
>>>>>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
>> it
>>>> with
>>>>>>>>> something
>>>>>>>>>>>>>> simpler.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
>>> haven't
>>>>> used
>>>>>>>> them
>>>>>>>>>>>>>> extensively
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
>>>> has(?)
>>>>> to
>>>>>>> be
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>>> form
>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
>>>>>>>>>>>>>>>>>>>>> - They need their own
>> schedule_interval,
>>>> but
>>>>>> it
>>>>>>>> has
>>>>>>>>> to
>>>>>>>>>>>> match
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> parent
>>>>>>>>>>>>>>>>>>>> dag
>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
>>>> (Does
>>>>>> it
>>>>>>>> make
>>>>>>>>>>> sense
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>> this?
>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
>>> sub
>>>>> dag
>>>>>>>> would
>>>>>>>>>>> never
>>>>>>>>>>>>>>>>> execute, so
>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
>>>>> operator a
>>>>>>>>> subdag
>>>>>>>>>>> with
>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
>>>>>> Berlin-Taylor <
>>>>>>>>>>>>>> ash@apache.org>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
>>>>> excited
>>>>>> to
>>>>>>>> see
>>>>>>>>>> how
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> progresses.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>> parsing*:
>>>>> This
>>>>>>>>>> rewrites
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>> parsing,
>>>>> and
>>>>>> it
>>>>>>>>> will
>>>>>>>>>>>> give a
>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
>>>> already
>>>>>> does
>>>>>>>>> this
>>>>>>>>>> I
>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>> At
>>>>>>>>>>>>>>>>>> least
>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
>>>> correctly.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> -ash
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
>>>> Huang <
>>>>>>>>>>>>>>> bin.huangxb@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
>>>> collect
>>>>>>>>> feedback
>>>>>>>>>> on
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> AIP-34
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
>>>>>> previously
>>>>>>>>>> briefly
>>>>>>>>>>>>>>>>> mentioned in
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
>>> done
>>>>> for
>>>>>>>>> Airflow
>>>>>>>>>>> 2.0,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> one of
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
>>> attach
>>>>>> tasks
>>>>>>>> back
>>>>>>>>>> to
>>>>>>>>>>>> the
>>>>>>>>>>>>>> root
>>>>>>>>>>>>>>>>> DAG.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
>>>>>> SubDagOperator
>>>>>>>>>> related
>>>>>>>>>>>>>> issues
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>>>> reattaching
>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
>> while
>>>>>>> respecting
>>>>>>>>>>>>>> dependencies
>>>>>>>>>>>>>>>>>> during
>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
>> effect
>>>> on
>>>>>> the
>>>>>>> UI
>>>>>>>>>> will
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> achieved
>>>>>>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
>>>> function
>>>>>> more
>>>>>>>>>>> reusable
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
>>>>>>> child_dag_name
>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>>>>>> signature
>>>>>>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
>>> parsing*:
>>>>> This
>>>>>>>>>> rewrites
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
>>> parsing,
>>>>> and
>>>>>> it
>>>>>>>>> will
>>>>>>>>>>>> give a
>>>>>>>>>>>>>>> flat
>>>>>>>>>>>>>>>>>>>>>>> structure at
>>>>>>>>>>>>>>>>>>>>>>> the task level
>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
>> new
>>>>>>>>> SubDagOperator
>>>>>>>>>>>> acts
>>>>>>>>>>>>>>> like a
>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
>>>>> methods
>>>>>>> are
>>>>>>>>>>> removed.
>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>> signature is
>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
>> *with
>>>>>>>>> *subdag_args
>>>>>>>>>>> *and
>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
>> PythonOperator
>>>>>>>> signature.
>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
>>>>>>> current_group
>>>>>>>> &
>>>>>>>>>>>>>> parent_group
>>>>>>>>>>>>>>>>>>>>> attributes
>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
>>> used
>>>>> to
>>>>>>>> group
>>>>>>>>>>> tasks
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> rendering at
>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
>>>>> further
>>>>>>> to
>>>>>>>>>> group
>>>>>>>>>>>>>>> arbitrary
>>>>>>>>>>>>>>>>>>> tasks
>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
>>> allow
>>>>>>>>> group-level
>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>> (i.e.
>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
>>> the
>>>>>> dag)
>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
>> Proposed
>>>> UI
>>>>>>>>>> modification
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
>>>> flat
>>>>>>>>> structure
>>>>>>>>>> to
>>>>>>>>>>>>> pair
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
>>>>> hierarchical
>>>>>>>>>>> structure.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
>> PRs
>>>> for
>>>>>>>> details:
>>>>>>>>>>>>>>>>>>>>>>> AIP:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
>>>>> aspects
>>>>>>>> that
>>>>>>>>>> you
>>>>>>>>>>>>>>>>>> agree/disagree
>>>>>>>>>>>>>>>>>>>>>>> with or
>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
>>> the
>>>>>> third
>>>>>>>>>> change
>>>>>>>>>>>>>>> regarding
>>>>>>>>>>>>>>>>>>>>> TaskGroup).
>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
>>>> looking
>>>>>>>> forward
>>>>>>>>>> to
>>>>>>>>>>>> it!
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>> Bin
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Thanks & Regards
>>>>>>>>>>>>>>> Poornima
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>> 
>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
>>>>> <+48%20660%20796%20129>>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Jarek Potiuk
>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>> 
>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
>>>>> <+48%20660%20796%20129>>
>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> *Jacob Ferriero*
>>>> 
>>>> Strategic Cloud Engineer: Data Engineering
>>>> 
>>>> jferriero@google.com
>>>> 
>>>> 617-714-2509
>>>> 
>>> 
>> 


Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
First of all, this is awesome!!

Secondly, checking your UI code, seems you are loading all operators at
once. Wondering if we can load them as needed (aka load whenever we click
the TaskGroup). Some of our DAGs are so large that take forever to load on
the Graph view, so worried about this still being an issue here. It may be
easily solvable by implementing lazy loading of the graph. Not sure how
easy to implement/add to the UI extension (and dont want to push for early
optimization as its the root of all evil).
Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <bi...@gmail.com> wrote:

> Hi Yu,
>
> Thank you so much for taking on this. I was fairly distracted previously
> and I didn't have the time to update the proposal. In fact, after
> discussing with Ash, Kaxil and Daniel, the direction of this AIP has been
> changed to favor the concept of TaskGroup instead of rewriting
> SubDagOperator (though it may may sense to deprecate SubDag in a future
> date.).
>
> Your PR is amazing and it has implemented the desire features. I think we
> can focus on your new PR instead. Do you mind updating the AIP based on
> what you have done in your PR?
>
> Best,
> Bin
>
>
> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com> wrote:
>
> > Hi, all, I've added the basic UI changes to my proposed implementation of
> > TaskGroup as UI grouping concept:
> > https://github.com/apache/airflow/pull/10153
> >
> > I think Chris had a pretty good specification of TaskGroup so i'm quoting
> > it here. The only thing I don't fully agree with is the restriction
> > "... **cannot*
> > have dependencies between a Task in a TaskGroup and either a*
> > *   Task in a different TaskGroup or a Task not in any group*". I think
> > this is over restrictive. Since TaskGroup is a UI concept, tasks can have
> > dependencies on tasks in other TaskGroup or not in any TaskGroup. In my
> PR,
> > this is allowed. The graph edges will update accordingly when TaskGroups
> > are expanded/collapsed. TaskGroup is only helping to make the UI look
> less
> > crowded. Under the hood, everything is still a DAG of tasks and edges so
> > things work normally. Here's a screenshot
> > <
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > >
> > of the UI interaction.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *   - Tasks can be added to a TaskGroup   - You *can* have dependencies
> > between Tasks in the same TaskGroup, but   *cannot* have dependencies
> > between a Task in a TaskGroup and either a   Task in a different
> TaskGroup
> > or a Task not in any group   - You *can* have dependencies between a
> > TaskGroup and either other   TaskGroups or Tasks not in any group   - The
> > UI will by default render a TaskGroup as a single "object", but   which
> you
> > expand or zoom into in some way   - You'd need some way to determine what
> > the "status" of a TaskGroup was   at least for UI display purposes*
> >
> >
> > Regarding Jake's comment, I agree it's possible to implement the
> "retrying
> > tasks in a group" pattern he mentioned as an optional feature of
> TaskGroup
> > although that may go against having TaskGroup as a pure UI concept. For
> the
> > motivating example Jake provided, I suggest implementing both
> > SubmitLongRunningJobTask and PollJobStatusSensor in a single operator. It
> > can do something like BaseSensorOperator.execute() does in "reschedule"
> > mode, i.e. it first executes some code to submit the long running job to
> > the external service, and store the state (e.g. in XCom). Then reschedule
> > itself. Subsequent runs then pokes for the completion state.
> >
> >
> > On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> <jferriero@google.com.invalid
> > >
> > wrote:
> >
> > > I really like this idea of a TaskGroup container as I think this will
> be
> > > much easier to use than SubDag.
> > >
> > > I'd like to propose an optional behavior for special retry mechanics
> via
> > a
> > > TaskGroup.retry_all property.
> > > This way I could use TaskGroup to replace my favorite use of SubDag for
> > > atomically retrying tasks of the pattern "act on external state then
> > > reschedule poll until desired state reached".
> > >
> > > Motivating use case I have for a SubDag is very simple two task group
> > > [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > I use SubDag is because it gives me an easy way to retry the
> > SubmitJobTask
> > > if something about the PollJobSensor fails.
> > > This pattern would be really nice for jobs that are expected to run a
> > long
> > > time (because we can use sensor can use reschedule mode freeing up
> slots)
> > > but might fail for a retryable reason.
> > > However, using SubDag to meet this use case defeats the purpose because
> > > SubDag infamously
> > > <
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > >
> > > blocks a "controller" slot for the entire duration.
> > > This may feel like a cyclic behavior but reality it is very common for
> a
> > > single operator to submit job / wait til done.
> > > We could use this case refactor many operators (e.g. BQ, Dataproc,
> > > Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with
> an
> > > optional reschedule mode if user knows that this job may take a long
> > time.
> > >
> > > I'd be happy to the development work on adding this specific retry
> > behavior
> > > to TaskGroup once the base concept is implemented if others in the
> > > community would find this a useful feature.
> > >
> > > Cheers,
> > > Jake
> > >
> > > On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > All for it :) . I think we are getting closer to have regular
> planning
> > > and
> > > > making some structured approach to 2.0 and starting task force for it
> > > soon,
> > > > so I think this should be perfectly fine to discuss and even start
> > > > implementing what's beyond as soon as we make sure that we are
> > > prioritizing
> > > > 2.0 work.
> > > >
> > > > J,
> > > >
> > > >
> > > > On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com>
> wrote:
> > > >
> > > > > Hi Jarek,
> > > > >
> > > > > I agree we should not change the behaviour of the existing
> > > SubDagOperator
> > > > > till Airflow 2.1. Is it okay to continue the discussion about
> > TaskGroup
> > > > as
> > > > > a brand new concept/feature independent from the existing
> > > SubDagOperator?
> > > > > In other words, shall we add TaskGroup as a UI grouping concept
> like
> > > Ash
> > > > > suggested, and not touch SubDagOperator atl all. Whenever we are
> > ready
> > > > with
> > > > > TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > > > >
> > > > > I really like Ash's idea of simplifying the SubDagOperator idea
> into
> > a
> > > > > simple UI grouping concept. I think Xinbin's idea of "reattaching
> all
> > > the
> > > > > tasks to the root DAG" is the way to go. And I see James pointed
> out
> > we
> > > > > need some helper functions to simplify dependencies setting of
> > > TaskGroup.
> > > > > Xinbin put up a pretty elegant example in his PR
> > > > > <https://github.com/apache/airflow/pull/9243>. I think having
> > > TaskGroup
> > > > as
> > > > > a UI concept should be a relatively small change. We can simplify
> > > > Xinbin's
> > > > > PR further. So I put up this alternative proposal here:
> > > > > https://github.com/apache/airflow/pull/10153
> > > > >
> > > > > I have not done any UI changes due to lack of experience with web
> UI.
> > > If
> > > > > anyone's interested, please take a look at the PR.
> > > > >
> > > > > Qian
> > > > >
> > > > > On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Similar point here to the other ideas that are popping up. Maybe
> we
> > > > > should
> > > > > > just focus on completing 2.0 and make all discussions about
> further
> > > > > > improvements to 2.1? While those are important discussions (and
> we
> > > > should
> > > > > > continue them in the  near future !) I think at this point
> focusing
> > > on
> > > > > > delivering 2.0 in its current shape should be our focus now ?
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > bin.huangxb@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Daniel
> > > > > > >
> > > > > > > I agree that the TaskGroup should have the same API as a DAG
> > object
> > > > > > related
> > > > > > > to task dependencies, but it will not have anything related to
> > > actual
> > > > > > > execution or scheduling.
> > > > > > > I will update the AIP according to this over the weekend.
> > > > > > >
> > > > > > > > We could even make a “DAGTemplate” object s.t. when you
> import
> > > the
> > > > > > object
> > > > > > > you can import it with parameters to determine the shape of the
> > > DAG.
> > > > > > >
> > > > > > > Can you elaborate a bit more on this? Does it serve a similar
> > > purpose
> > > > > as
> > > > > > a
> > > > > > > DAG factory function?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > > daniel.imberman@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > Why not give the TaskGroup the same API as a DAG object (e.g.
> > the
> > > > > > bitwise
> > > > > > > > operator fro task dependencies). We could even make a
> > > “DAGTemplate”
> > > > > > > object
> > > > > > > > s.t. when you import the object you can import it with
> > parameters
> > > > to
> > > > > > > > determine the shape of the DAG.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > bin.huangxb@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > The TaskGroup will not take schedule interval as a parameter
> > > > itself,
> > > > > > and
> > > > > > > it
> > > > > > > > depends on the DAG where it attaches to. In my opinion, the
> > > > TaskGroup
> > > > > > > will
> > > > > > > > only contain a group of tasks with interdependencies, and the
> > > > > TaskGroup
> > > > > > > > behaves like a task. It doesn't contain any
> > execution/scheduling
> > > > > logic
> > > > > > > > (i.e. schedule_interval, concurrency, max_active_runs etc.)
> > like
> > > a
> > > > > DAG
> > > > > > > > does.
> > > > > > > >
> > > > > > > > > For example, there is the scenario that the schedule
> interval
> > > of
> > > > > DAG
> > > > > > is
> > > > > > > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > > > > >
> > > > > > > > I am curious why you ask this. Is this a use case that you
> want
> > > to
> > > > > > > achieve?
> > > > > > > >
> > > > > > > > Bin
> > > > > > > >
> > > > > > > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> thanosxnicholas@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > > Using TaskGroup, Is the schedule interval of TaskGroup the
> > same
> > > > as
> > > > > > the
> > > > > > > > > parent DAG? My main concern is whether the schedule
> interval
> > of
> > > > > > > TaskGroup
> > > > > > > > > could be different with that of the DAG? For example, there
> > is
> > > > the
> > > > > > > > scenario
> > > > > > > > > that the schedule interval of DAG is 1 hour and the
> schedule
> > > > > interval
> > > > > > > of
> > > > > > > > > TaskGroup is 20 min.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Nicholas
> > > > > > > > >
> > > > > > > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > bin.huangxb@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Nicholas,
> > > > > > > > > >
> > > > > > > > > > I am not sure about the old behavior of SubDagOperator,
> > maybe
> > > > it
> > > > > > will
> > > > > > > > > throw
> > > > > > > > > > an error? But in the original proposal, the subdag's
> > > > > > > schedule_interval
> > > > > > > > > will
> > > > > > > > > > be ignored. Or if we decide to use TaskGroup to replace
> > > SubDag,
> > > > > > there
> > > > > > > > > will
> > > > > > > > > > be no subdag schedule_interval.
> > > > > > > > > >
> > > > > > > > > > Bin
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > thanosxnicholas@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > > Thanks for your good proposal. I was confused whether
> the
> > > > > > schedule
> > > > > > > > > > > interval of SubDAG is different from that of the parent
> > > DAG?
> > > > I
> > > > > > have
> > > > > > > > > > > discussed with Jiajie Zhong about the schedule interval
> > of
> > > > > > SubDAG.
> > > > > > > If
> > > > > > > > > the
> > > > > > > > > > > SubDagOperator has a different schedule interval, what
> > will
> > > > > > happen
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > scheduler to schedule the parent DAG?
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Nicholas Jiang
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > > > bin.huangxb@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > > > > > >
> > > > > > > > > > > > I have rethought about the concept of subdag and task
> > > > > groups. I
> > > > > > > > think
> > > > > > > > > > the
> > > > > > > > > > > > better way to approach this is to entirely remove
> > subdag
> > > > and
> > > > > > > > > introduce
> > > > > > > > > > > the
> > > > > > > > > > > > concept of TaskGroup, which is a container of tasks
> > along
> > > > > with
> > > > > > > > their
> > > > > > > > > > > > dependencies *without execution/scheduling logic as a
> > > DAG*.
> > > > > The
> > > > > > > > only
> > > > > > > > > > > > purpose of it is to group a list of tasks, but you
> > still
> > > > need
> > > > > > to
> > > > > > > > add
> > > > > > > > > it
> > > > > > > > > > > to
> > > > > > > > > > > > a DAG for execution.
> > > > > > > > > > > >
> > > > > > > > > > > > Here is a small code snippet.
> > > > > > > > > > > >
> > > > > > > > > > > > ```
> > > > > > > > > > > > class TaskGroup:
> > > > > > > > > > > > """
> > > > > > > > > > > > A TaskGroup contains a group of tasks.
> > > > > > > > > > > >
> > > > > > > > > > > > If default_args is missing, it will take default args
> > > from
> > > > > the
> > > > > > > > > DAG.
> > > > > > > > > > > > """
> > > > > > > > > > > > def __init__(self, group_id, default_args):
> > > > > > > > > > > > pass
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > """
> > > > > > > > > > > > You can add tasks to a task group similar to adding
> > tasks
> > > > to
> > > > > a
> > > > > > > DAG
> > > > > > > > > > > >
> > > > > > > > > > > > This can be declared in a separate file from the dag
> > file
> > > > > > > > > > > > """
> > > > > > > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > > > > > > default_args=default_args)
> > > > > > > > > > > > download_group.add_task(task1)
> > > > > > > > > > > > task2.dag = download_group
> > > > > > > > > > > >
> > > > > > > > > > > > with download_group:
> > > > > > > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > > > > > > >
> > > > > > > > > > > > [task, task2] >> task3
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > """Add it to a DAG for execution"""
> > > > > > > > > > > > with DAG(dag_id='start_download_dag',
> > > > > > default_args=default_args,
> > > > > > > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > > > > > > start = DummyOperator(task_id='start')
> > > > > > > > > > > > start >> download_group
> > > > > > > > > > > > # this is equivalent to
> > > > > > > > > > > > # start >> [task, task2] >> task3
> > > > > > > > > > > > ```
> > > > > > > > > > > >
> > > > > > > > > > > > With this, we can still reuse a group of tasks and
> set
> > > > > > > dependencies
> > > > > > > > > > > between
> > > > > > > > > > > > them; it avoids the boilerplate code from using
> > > > > SubDagOperator,
> > > > > > > and
> > > > > > > > > we
> > > > > > > > > > > can
> > > > > > > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > > > > > > >
> > > > > > > > > > > > User migration wise, we can introduce it before
> Airflow
> > > 2.0
> > > > > and
> > > > > > > > allow
> > > > > > > > > > > > gradual transition. Then we can decide if we still
> want
> > > to
> > > > > keep
> > > > > > > the
> > > > > > > > > > > > SubDagOperator or simply remove it.
> > > > > > > > > > > >
> > > > > > > > > > > > Any thoughts?
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers,
> > > > > > > > > > > > Bin
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1, proposal looks good.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The original intention was really to have tasks
> > groups
> > > > and
> > > > > a
> > > > > > > > > > > zoom-in/out
> > > > > > > > > > > > in
> > > > > > > > > > > > > the UI. The original reasoning was to reuse the DAG
> > > > object
> > > > > > > since
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > a
> > > > > > > > > > > > > group of tasks, but as highlighted here it does
> > create
> > > > > > > underlying
> > > > > > > > > > > > > confusions since a DAG is much more than just a
> group
> > > of
> > > > > > tasks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Max
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > > > > > > joshipoornima06@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you for your email.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This
> > > > > rewrites
> > > > > > > the
> > > > > > > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > > > > method to unpack subdag while parsing, and
> it
> > > > will
> > > > > > > give a
> > > > > > > > > > > flat
> > > > > > > > > > > > > > > > > structure at
> > > > > > > > > > > > > > > > > the task level
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The serialized_dag representation already
> does
> > > > this I
> > > > > > > > think.
> > > > > > > > > At
> > > > > > > > > > > > least
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I am not sure about serialized_dag
> > representation,
> > > > but
> > > > > at
> > > > > > > > least
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > still keep the subdag entry in the DAG table?
> In
> > my
> > > > > > > proposal
> > > > > > > > as
> > > > > > > > > > > also
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > draft PR, the idea is to *extract the tasks
> from
> > > the
> > > > > > subdag
> > > > > > > > and
> > > > > > > > > > add
> > > > > > > > > > > > > them
> > > > > > > > > > > > > > > back to the root_dag. *So the runtime DAG graph
> > > will
> > > > > look
> > > > > > > > > exactly
> > > > > > > > > > > the
> > > > > > > > > > > > > > > same as without subdag but with metadata
> attached
> > > to
> > > > > > those
> > > > > > > > > > > sections.
> > > > > > > > > > > > > > These
> > > > > > > > > > > > > > > metadata will be later on used to render in the
> > UI.
> > > > So
> > > > > > > after
> > > > > > > > > > > parsing
> > > > > > > > > > > > (
> > > > > > > > > > > > > > > *DagBag.process_file()*), it will just output
> the
> > > > > > *root_dag
> > > > > > > > > > > *instead
> > > > > > > > > > > > of
> > > > > > > > > > > > > > *root_dag +
> > > > > > > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - e.g. section-1-* will have metadata
> > > > > > > > > current_group=section-1,
> > > > > > > > > > > > > > > parent_group=<the-root-dag-id> (welcome for
> > naming
> > > > > > > > > > suggestions),
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > reason for parent_group is that we can have
> > nested
> > > > > group
> > > > > > > and
> > > > > > > > > > > still
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > > able to capture the dependency.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Runtime DAG:
> > > > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > While at the UI, what we see would be something
> > > like
> > > > > this
> > > > > > > by
> > > > > > > > > > > > utilizing
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > metadata, and then we can expand or zoom into
> in
> > > some
> > > > > > way.
> > > > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > > > > > > 1. We don't need to deal with the extra
> > complexity
> > > of
> > > > > > > SubDag
> > > > > > > > > for
> > > > > > > > > > > > > > execution
> > > > > > > > > > > > > > > and scheduling. It will be the same as not
> using
> > > > > SubDag.
> > > > > > > > > > > > > > > 2. Still have the benefits of modularized and
> > > > reusable
> > > > > > dag
> > > > > > > > code
> > > > > > > > > > and
> > > > > > > > > > > > > > > declare dependencies between them. And with the
> > new
> > > > > > > > > > SubDagOperator
> > > > > > > > > > > > (see
> > > > > > > > > > > > > > AIP
> > > > > > > > > > > > > > > or draft PR), we can use the same dag_factory
> > > > function
> > > > > > for
> > > > > > > > > > > > generating 1
> > > > > > > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag
> > (in
> > > > this
> > > > > > > case,
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > > just
> > > > > > > > > > > > > > > extract all underlying tasks and append to the
> > root
> > > > > dag).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Then it gets to the idea of replacing subdag
> > > with a
> > > > > > > > > simpler
> > > > > > > > > > > > > concept
> > > > > > > > > > > > > > > by Ash: the proposed change basically drains
> out
> > > the
> > > > > > > > > contents
> > > > > > > > > > > of
> > > > > > > > > > > > a
> > > > > > > > > > > > > > SubDag
> > > > > > > > > > > > > > > and becomes more like
> > > > > > > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > > > > > > (forgive
> > > > > > > > > > > > > > > me about the crazy name..). In this case, it is
> > > still
> > > > > > > > > > necessary
> > > > > > > > > > > to
> > > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > > concept of subdag as it is nothing more than a
> > > name?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > That's why the TaskGroup idea comes up. Thanks
> > > Chris
> > > > > > Palmer
> > > > > > > > for
> > > > > > > > > > > > helping
> > > > > > > > > > > > > > > conceptualize the functionality of TaskGroup, I
> > > will
> > > > > just
> > > > > > > > paste
> > > > > > > > > > it
> > > > > > > > > > > > > here.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > > > > - You *can* have dependencies between Tasks
> in
> > > the
> > > > > same
> > > > > > > > > > > TaskGroup,
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > *cannot* have dependencies between a Task in
> a
> > > > > > TaskGroup
> > > > > > > > > and
> > > > > > > > > > > > > either a
> > > > > > > > > > > > > > > > Task in a different TaskGroup or a Task not
> in
> > > any
> > > > > > group
> > > > > > > > > > > > > > > > - You *can* have dependencies between a
> > TaskGroup
> > > > and
> > > > > > > > > either
> > > > > > > > > > > > other
> > > > > > > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > > > > > > - The UI will by default render a TaskGroup
> as
> > a
> > > > > single
> > > > > > > > > > > "object",
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > > > > > > - You'd need some way to determine what the
> > > > "status"
> > > > > > of a
> > > > > > > > > > > > TaskGroup
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > at least for UI display purposes
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I agree with Chris:
> > > > > > > > > > > > > > > - From the backend's view (scheduler &
> > executor), I
> > > > > think
> > > > > > > > > > TaskGroup
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be ignored during execution. (unless we decide
> to
> > > > > > implement
> > > > > > > > > some
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > operations that allows start/stop a group of
> > tasks
> > > > > etc.)
> > > > > > > > > > > > > > > - From the UI's View, it should be able to pick
> > up
> > > > the
> > > > > > > > > individual
> > > > > > > > > > > > > tasks'
> > > > > > > > > > > > > > > status and then determine the TaskGroup's
> status
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Bin
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel
> Imberman
> > <
> > > > > > > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> I hadn’t thought about using the `>>` operator
> > to
> > > > tie
> > > > > > dags
> > > > > > > > > > > together
> > > > > > > > > > > > > but
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > >> think that sounds pretty great! I wonder if we
> > > could
> > > > > > > > > essentially
> > > > > > > > > > > > write
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > >> the ability to set dependencies to all
> > > starter-tasks
> > > > > for
> > > > > > > > that
> > > > > > > > > > DAG.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I’m personally ok with SubDag being a mostly
> UI
> > > > > concept.
> > > > > > > It
> > > > > > > > > > > doesn’t
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > >> to execute separately, you’re just adding more
> > > tasks
> > > > > to
> > > > > > > the
> > > > > > > > > > queue
> > > > > > > > > > > > that
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > >> be executed when there are resources
> available.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> via Newton Mail [
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > > > > > > >> ]
> > > > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> <
> > > > > > > > > > chris@crpalmer.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >> I agree that SubDAGs are an overly complex
> > > > > abstraction.
> > > > > > I
> > > > > > > > > think
> > > > > > > > > > > what
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > >> needed/useful is a TaskGroup concept. On a
> high
> > > > level
> > > > > I
> > > > > > > > think
> > > > > > > > > > you
> > > > > > > > > > > > want
> > > > > > > > > > > > > > >> this
> > > > > > > > > > > > > > >> functionality:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > > >> - You *can* have dependencies between Tasks in
> > the
> > > > > same
> > > > > > > > > > TaskGroup,
> > > > > > > > > > > > but
> > > > > > > > > > > > > > >> *cannot* have dependencies between a Task in a
> > > > > TaskGroup
> > > > > > > and
> > > > > > > > > > > either
> > > > > > > > > > > > a
> > > > > > > > > > > > > > >> Task in a different TaskGroup or a Task not in
> > any
> > > > > group
> > > > > > > > > > > > > > >> - You *can* have dependencies between a
> > TaskGroup
> > > > and
> > > > > > > either
> > > > > > > > > > other
> > > > > > > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > > > > > > >> - The UI will by default render a TaskGroup
> as a
> > > > > single
> > > > > > > > > > "object",
> > > > > > > > > > > > but
> > > > > > > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > > > > > > >> - You'd need some way to determine what the
> > > "status"
> > > > > of
> > > > > > a
> > > > > > > > > > > TaskGroup
> > > > > > > > > > > > > was
> > > > > > > > > > > > > > >> at least for UI display purposes
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Not sure if it would need to be a top level
> > object
> > > > > with
> > > > > > > its
> > > > > > > > > own
> > > > > > > > > > > > > database
> > > > > > > > > > > > > > >> table and model or just another attribute on
> > > tasks.
> > > > I
> > > > > > > think
> > > > > > > > > you
> > > > > > > > > > > > could
> > > > > > > > > > > > > > >> build
> > > > > > > > > > > > > > >> it in a way such that from the schedulers
> point
> > of
> > > > > view
> > > > > > a
> > > > > > > > DAG
> > > > > > > > > > with
> > > > > > > > > > > > > > >> TaskGroups doesn't get treated any
> differently.
> > So
> > > > it
> > > > > > > really
> > > > > > > > > > just
> > > > > > > > > > > > > > becomes
> > > > > > > > > > > > > > >> a
> > > > > > > > > > > > > > >> shortcut for setting dependencies between sets
> > of
> > > > > Tasks,
> > > > > > > and
> > > > > > > > > > > allows
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > UI
> > > > > > > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Chris
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > Agree with James (and think it's actually
> the
> > > more
> > > > > > > > important
> > > > > > > > > > > issue
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > >> fix),
> > > > > > > > > > > > > > >> > but I am still convinced Ash' idea is the
> > right
> > > > way
> > > > > > > > forward
> > > > > > > > > > > (just
> > > > > > > > > > > > it
> > > > > > > > > > > > > > >> might
> > > > > > > > > > > > > > >> > require a bit more work to deprecate than
> > adding
> > > > > > visual
> > > > > > > > > > grouping
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > >> > UI).
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > There was a previous thread about this FYI
> > with
> > > > more
> > > > > > > > context
> > > > > > > > > > on
> > > > > > > > > > > > why
> > > > > > > > > > > > > > >> subdags
> > > > > > > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > >
> > > > > >
> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > > > > > > . A
> > > > > > > > > > > > > > >> > solution I outline there to Jame's problem
> is
> > > e.g.
> > > > > > > > enabling
> > > > > > > > > > the
> > > > > > > > > > > >>
> > > > > > > > > > > > > > >> operator
> > > > > > > > > > > > > > >> > for Airflow operators to work with DAGs as
> > > well. I
> > > > > see
> > > > > > > > this
> > > > > > > > > > > being
> > > > > > > > > > > > > > >> separate
> > > > > > > > > > > > > > >> > from Ash' solution for DAG grouping in the
> UI
> > > but
> > > > > one
> > > > > > of
> > > > > > > > the
> > > > > > > > > > two
> > > > > > > > > > > > > items
> > > > > > > > > > > > > > >> > required to replace all existing subdag
> > > > > functionality.
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > I've been working with subdags for 3 years
> and
> > > > they
> > > > > > are
> > > > > > > > > > always a
> > > > > > > > > > > > > giant
> > > > > > > > > > > > > > >> pain
> > > > > > > > > > > > > > >> > to use. They are a constant source of user
> > > > confusion
> > > > > > and
> > > > > > > > > > > breakages
> > > > > > > > > > > > > > >> during
> > > > > > > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James
> Coder <
> > > > > > > > > > > jcoder01@gmail.com>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > > I'm not sure I totally agree it's just a
> UI
> > > > > > concept. I
> > > > > > > > use
> > > > > > > > > > the
> > > > > > > > > > > > > > subdag
> > > > > > > > > > > > > > >> > > operator to simplify dependencies too. If
> > you
> > > > > have a
> > > > > > > > group
> > > > > > > > > > of
> > > > > > > > > > > > > tasks
> > > > > > > > > > > > > > >> that
> > > > > > > > > > > > > > >> > > need to finish before another group of
> tasks
> > > > > start,
> > > > > > > > using
> > > > > > > > > a
> > > > > > > > > > > > subdag
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > >> a
> > > > > > > > > > > > > > >> > > pretty quick way to set those dependencies
> > > and I
> > > > > > think
> > > > > > > > > also
> > > > > > > > > > > make
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle
> Hamlin
> > <
> > > > > > > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > Berlin-Taylor
> > > > > > <
> > > > > > > > > > > > > ash@apache.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > > Question:
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > Do we even need the SubDagOperator
> > > anymore?
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > Would removing it entirely and just
> > > > replacing
> > > > > it
> > > > > > > > with
> > > > > > > > > a
> > > > > > > > > > UI
> > > > > > > > > > > > > > >> grouping
> > > > > > > > > > > > > > >> > > > > concept be conceptually simpler, less
> to
> > > get
> > > > > > > wrong,
> > > > > > > > > and
> > > > > > > > > > > > closer
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > >> > what
> > > > > > > > > > > > > > >> > > > > users actually want to achieve with
> > > subdags?
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > With your proposed change, tasks in
> > > subdags
> > > > > > could
> > > > > > > > > start
> > > > > > > > > > > > > running
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > >> > > > > parallel (a good change) -- so should
> we
> > > not
> > > > > > also
> > > > > > > > just
> > > > > > > > > > > > > > _enitrely_
> > > > > > > > > > > > > > >> > > remove
> > > > > > > > > > > > > > >> > > > > the concept of a sub dag and replace
> it
> > > with
> > > > > > > > something
> > > > > > > > > > > > > simpler.
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > Problems with subdags (I think. I
> > haven't
> > > > used
> > > > > > > them
> > > > > > > > > > > > > extensively
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > >> > may
> > > > > > > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > > > > > > >> > > > > - They need their own dag_id, but it
> > > has(?)
> > > > to
> > > > > > be
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > > form
> > > > > > > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > > > > > > >> > > > > - They need their own
> schedule_interval,
> > > but
> > > > > it
> > > > > > > has
> > > > > > > > to
> > > > > > > > > > > match
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > >> > parent
> > > > > > > > > > > > > > >> > > > dag
> > > > > > > > > > > > > > >> > > > > - Sub dags can be paused on their own.
> > > (Does
> > > > > it
> > > > > > > make
> > > > > > > > > > sense
> > > > > > > > > > > > to
> > > > > > > > > > > > > do
> > > > > > > > > > > > > > >> > this?
> > > > > > > > > > > > > > >> > > > > Pausing just a sub dag would mean the
> > sub
> > > > dag
> > > > > > > would
> > > > > > > > > > never
> > > > > > > > > > > > > > >> execute, so
> > > > > > > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > > > > > > >> > > > > - You had to choose the executor to
> > > > operator a
> > > > > > > > subdag
> > > > > > > > > > with
> > > > > > > > > > > > --
> > > > > > > > > > > > > > >> always
> > > > > > > > > > > > > > >> > a
> > > > > > > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > -ash
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash
> > > > > Berlin-Taylor <
> > > > > > > > > > > > > ash@apache.org>
> > > > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm
> > > > excited
> > > > > to
> > > > > > > see
> > > > > > > > > how
> > > > > > > > > > > > this
> > > > > > > > > > > > > > >> > > progresses.
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag
> > parsing*:
> > > > This
> > > > > > > > > rewrites
> > > > > > > > > > > the
> > > > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > >> > > > > >> method to unpack subdag while
> > parsing,
> > > > and
> > > > > it
> > > > > > > > will
> > > > > > > > > > > give a
> > > > > > > > > > > > > > flat
> > > > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > > The serialized_dag representation
> > > already
> > > > > does
> > > > > > > > this
> > > > > > > > > I
> > > > > > > > > > > > think.
> > > > > > > > > > > > > > At
> > > > > > > > > > > > > > >> > least
> > > > > > > > > > > > > > >> > > > if
> > > > > > > > > > > > > > >> > > > > > I've understood your idea here
> > > correctly.
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > > -ash
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin
> > > Huang <
> > > > > > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Sending a message to everyone and
> > > collect
> > > > > > > > feedback
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > > > > >> AIP-34
> > > > > > > > > > > > > > >> > on
> > > > > > > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was
> > > > > previously
> > > > > > > > > briefly
> > > > > > > > > > > > > > >> mentioned in
> > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > >> > > > > >> discussion about what needs to be
> > done
> > > > for
> > > > > > > > Airflow
> > > > > > > > > > 2.0,
> > > > > > > > > > > > and
> > > > > > > > > > > > > > >> one of
> > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > >> > > > > >> ideas is to make SubDagOperator
> > attach
> > > > > tasks
> > > > > > > back
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > root
> > > > > > > > > > > > > > >> DAG.
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> This AIP-34 focuses on solving
> > > > > SubDagOperator
> > > > > > > > > related
> > > > > > > > > > > > > issues
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > >> > > > > reattaching
> > > > > > > > > > > > > > >> > > > > >> all tasks back to the root dag
> while
> > > > > > respecting
> > > > > > > > > > > > > dependencies
> > > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > > >> > > > > >> parsing. The original grouping
> effect
> > > on
> > > > > the
> > > > > > UI
> > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > > >> achieved
> > > > > > > > > > > > > > >> > > > through
> > > > > > > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> This also makes the dag_factory
> > > function
> > > > > more
> > > > > > > > > > reusable
> > > > > > > > > > > > > > because
> > > > > > > > > > > > > > >> you
> > > > > > > > > > > > > > >> > > > don't
> > > > > > > > > > > > > > >> > > > > >> need to have parent_dag_name and
> > > > > > child_dag_name
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > function
> > > > > > > > > > > > > > >> > > > > signature
> > > > > > > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag
> > parsing*:
> > > > This
> > > > > > > > > rewrites
> > > > > > > > > > > the
> > > > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > >> > > > > >> method to unpack subdag while
> > parsing,
> > > > and
> > > > > it
> > > > > > > > will
> > > > > > > > > > > give a
> > > > > > > > > > > > > > flat
> > > > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The
> new
> > > > > > > > SubDagOperator
> > > > > > > > > > > acts
> > > > > > > > > > > > > > like a
> > > > > > > > > > > > > > >> > > > > >> container and most of the original
> > > > methods
> > > > > > are
> > > > > > > > > > removed.
> > > > > > > > > > > > The
> > > > > > > > > > > > > > >> > > > > >> signature is
> > > > > > > > > > > > > > >> > > > > >> also changed to *subdag_factory
> *with
> > > > > > > > *subdag_args
> > > > > > > > > > *and
> > > > > > > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > > > > > > >> > > > > >> This is similar to the
> PythonOperator
> > > > > > > signature.
> > > > > > > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> > > > > > current_group
> > > > > > > &
> > > > > > > > > > > > > parent_group
> > > > > > > > > > > > > > >> > > > > attributes
> > > > > > > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is
> > used
> > > > to
> > > > > > > group
> > > > > > > > > > tasks
> > > > > > > > > > > > for
> > > > > > > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > > > > > > >> > > > > >> UI level. It may potentially extend
> > > > further
> > > > > > to
> > > > > > > > > group
> > > > > > > > > > > > > > arbitrary
> > > > > > > > > > > > > > >> > > tasks
> > > > > > > > > > > > > > >> > > > > >> outside the context of subdag to
> > allow
> > > > > > > > group-level
> > > > > > > > > > > > > operations
> > > > > > > > > > > > > > >> > > (i.e.
> > > > > > > > > > > > > > >> > > > > >> stop/trigger a group of task within
> > the
> > > > > dag)
> > > > > > > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*:
> Proposed
> > > UI
> > > > > > > > > modification
> > > > > > > > > > > to
> > > > > > > > > > > > > > allow
> > > > > > > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a
> > > flat
> > > > > > > > structure
> > > > > > > > > to
> > > > > > > > > > > > pair
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > > >> > > > > first
> > > > > > > > > > > > > > >> > > > > >> change instead of the original
> > > > hierarchical
> > > > > > > > > > structure.
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Please see related documents and
> PRs
> > > for
> > > > > > > details:
> > > > > > > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > > > > > > >> > > > > >> Draft PR:
> > > > > > > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Please let me know if there are any
> > > > aspects
> > > > > > > that
> > > > > > > > > you
> > > > > > > > > > > > > > >> > agree/disagree
> > > > > > > > > > > > > > >> > > > > >> with or
> > > > > > > > > > > > > > >> > > > > >> need more clarification (especially
> > the
> > > > > third
> > > > > > > > > change
> > > > > > > > > > > > > > regarding
> > > > > > > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > > > > > > >> > > > > >> Any comments are welcome and I am
> > > looking
> > > > > > > forward
> > > > > > > > > to
> > > > > > > > > > > it!
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > > > > > > >> > > > > >> Bin
> > > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Thanks & Regards
> > > > > > > > > > > > > > Poornima
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > <+48%20660%20796%20129>>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > > <+48%20660%20796%20129>>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > >
> > > *Jacob Ferriero*
> > >
> > > Strategic Cloud Engineer: Data Engineering
> > >
> > > jferriero@google.com
> > >
> > > 617-714-2509
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Xinbin Huang <bi...@gmail.com>.
Hi Yu,

Thank you so much for taking on this. I was fairly distracted previously
and I didn't have the time to update the proposal. In fact, after
discussing with Ash, Kaxil and Daniel, the direction of this AIP has been
changed to favor the concept of TaskGroup instead of rewriting
SubDagOperator (though it may may sense to deprecate SubDag in a future
date.).

Your PR is amazing and it has implemented the desire features. I think we
can focus on your new PR instead. Do you mind updating the AIP based on
what you have done in your PR?

Best,
Bin


On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <yu...@gmail.com> wrote:

> Hi, all, I've added the basic UI changes to my proposed implementation of
> TaskGroup as UI grouping concept:
> https://github.com/apache/airflow/pull/10153
>
> I think Chris had a pretty good specification of TaskGroup so i'm quoting
> it here. The only thing I don't fully agree with is the restriction
> "... **cannot*
> have dependencies between a Task in a TaskGroup and either a*
> *   Task in a different TaskGroup or a Task not in any group*". I think
> this is over restrictive. Since TaskGroup is a UI concept, tasks can have
> dependencies on tasks in other TaskGroup or not in any TaskGroup. In my PR,
> this is allowed. The graph edges will update accordingly when TaskGroups
> are expanded/collapsed. TaskGroup is only helping to make the UI look less
> crowded. Under the hood, everything is still a DAG of tasks and edges so
> things work normally. Here's a screenshot
> <
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> >
> of the UI interaction.
>
>
>
>
>
>
>
>
>
>
>
> *   - Tasks can be added to a TaskGroup   - You *can* have dependencies
> between Tasks in the same TaskGroup, but   *cannot* have dependencies
> between a Task in a TaskGroup and either a   Task in a different TaskGroup
> or a Task not in any group   - You *can* have dependencies between a
> TaskGroup and either other   TaskGroups or Tasks not in any group   - The
> UI will by default render a TaskGroup as a single "object", but   which you
> expand or zoom into in some way   - You'd need some way to determine what
> the "status" of a TaskGroup was   at least for UI display purposes*
>
>
> Regarding Jake's comment, I agree it's possible to implement the "retrying
> tasks in a group" pattern he mentioned as an optional feature of TaskGroup
> although that may go against having TaskGroup as a pure UI concept. For the
> motivating example Jake provided, I suggest implementing both
> SubmitLongRunningJobTask and PollJobStatusSensor in a single operator. It
> can do something like BaseSensorOperator.execute() does in "reschedule"
> mode, i.e. it first executes some code to submit the long running job to
> the external service, and store the state (e.g. in XCom). Then reschedule
> itself. Subsequent runs then pokes for the completion state.
>
>
> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero <jferriero@google.com.invalid
> >
> wrote:
>
> > I really like this idea of a TaskGroup container as I think this will be
> > much easier to use than SubDag.
> >
> > I'd like to propose an optional behavior for special retry mechanics via
> a
> > TaskGroup.retry_all property.
> > This way I could use TaskGroup to replace my favorite use of SubDag for
> > atomically retrying tasks of the pattern "act on external state then
> > reschedule poll until desired state reached".
> >
> > Motivating use case I have for a SubDag is very simple two task group
> > [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > I use SubDag is because it gives me an easy way to retry the
> SubmitJobTask
> > if something about the PollJobSensor fails.
> > This pattern would be really nice for jobs that are expected to run a
> long
> > time (because we can use sensor can use reschedule mode freeing up slots)
> > but might fail for a retryable reason.
> > However, using SubDag to meet this use case defeats the purpose because
> > SubDag infamously
> > <
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > >
> > blocks a "controller" slot for the entire duration.
> > This may feel like a cyclic behavior but reality it is very common for a
> > single operator to submit job / wait til done.
> > We could use this case refactor many operators (e.g. BQ, Dataproc,
> > Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with an
> > optional reschedule mode if user knows that this job may take a long
> time.
> >
> > I'd be happy to the development work on adding this specific retry
> behavior
> > to TaskGroup once the base concept is implemented if others in the
> > community would find this a useful feature.
> >
> > Cheers,
> > Jake
> >
> > On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > All for it :) . I think we are getting closer to have regular planning
> > and
> > > making some structured approach to 2.0 and starting task force for it
> > soon,
> > > so I think this should be perfectly fine to discuss and even start
> > > implementing what's beyond as soon as we make sure that we are
> > prioritizing
> > > 2.0 work.
> > >
> > > J,
> > >
> > >
> > > On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com> wrote:
> > >
> > > > Hi Jarek,
> > > >
> > > > I agree we should not change the behaviour of the existing
> > SubDagOperator
> > > > till Airflow 2.1. Is it okay to continue the discussion about
> TaskGroup
> > > as
> > > > a brand new concept/feature independent from the existing
> > SubDagOperator?
> > > > In other words, shall we add TaskGroup as a UI grouping concept like
> > Ash
> > > > suggested, and not touch SubDagOperator atl all. Whenever we are
> ready
> > > with
> > > > TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > > >
> > > > I really like Ash's idea of simplifying the SubDagOperator idea into
> a
> > > > simple UI grouping concept. I think Xinbin's idea of "reattaching all
> > the
> > > > tasks to the root DAG" is the way to go. And I see James pointed out
> we
> > > > need some helper functions to simplify dependencies setting of
> > TaskGroup.
> > > > Xinbin put up a pretty elegant example in his PR
> > > > <https://github.com/apache/airflow/pull/9243>. I think having
> > TaskGroup
> > > as
> > > > a UI concept should be a relatively small change. We can simplify
> > > Xinbin's
> > > > PR further. So I put up this alternative proposal here:
> > > > https://github.com/apache/airflow/pull/10153
> > > >
> > > > I have not done any UI changes due to lack of experience with web UI.
> > If
> > > > anyone's interested, please take a look at the PR.
> > > >
> > > > Qian
> > > >
> > > > On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > >
> > > > wrote:
> > > >
> > > > > Similar point here to the other ideas that are popping up. Maybe we
> > > > should
> > > > > just focus on completing 2.0 and make all discussions about further
> > > > > improvements to 2.1? While those are important discussions (and we
> > > should
> > > > > continue them in the  near future !) I think at this point focusing
> > on
> > > > > delivering 2.0 in its current shape should be our focus now ?
> > > > >
> > > > > J.
> > > > >
> > > > > On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> bin.huangxb@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Daniel
> > > > > >
> > > > > > I agree that the TaskGroup should have the same API as a DAG
> object
> > > > > related
> > > > > > to task dependencies, but it will not have anything related to
> > actual
> > > > > > execution or scheduling.
> > > > > > I will update the AIP according to this over the weekend.
> > > > > >
> > > > > > > We could even make a “DAGTemplate” object s.t. when you import
> > the
> > > > > object
> > > > > > you can import it with parameters to determine the shape of the
> > DAG.
> > > > > >
> > > > > > Can you elaborate a bit more on this? Does it serve a similar
> > purpose
> > > > as
> > > > > a
> > > > > > DAG factory function?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > daniel.imberman@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > Why not give the TaskGroup the same API as a DAG object (e.g.
> the
> > > > > bitwise
> > > > > > > operator fro task dependencies). We could even make a
> > “DAGTemplate”
> > > > > > object
> > > > > > > s.t. when you import the object you can import it with
> parameters
> > > to
> > > > > > > determine the shape of the DAG.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > bin.huangxb@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > The TaskGroup will not take schedule interval as a parameter
> > > itself,
> > > > > and
> > > > > > it
> > > > > > > depends on the DAG where it attaches to. In my opinion, the
> > > TaskGroup
> > > > > > will
> > > > > > > only contain a group of tasks with interdependencies, and the
> > > > TaskGroup
> > > > > > > behaves like a task. It doesn't contain any
> execution/scheduling
> > > > logic
> > > > > > > (i.e. schedule_interval, concurrency, max_active_runs etc.)
> like
> > a
> > > > DAG
> > > > > > > does.
> > > > > > >
> > > > > > > > For example, there is the scenario that the schedule interval
> > of
> > > > DAG
> > > > > is
> > > > > > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > > > >
> > > > > > > I am curious why you ask this. Is this a use case that you want
> > to
> > > > > > achieve?
> > > > > > >
> > > > > > > Bin
> > > > > > >
> > > > > > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <thanosxnicholas@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > > Using TaskGroup, Is the schedule interval of TaskGroup the
> same
> > > as
> > > > > the
> > > > > > > > parent DAG? My main concern is whether the schedule interval
> of
> > > > > > TaskGroup
> > > > > > > > could be different with that of the DAG? For example, there
> is
> > > the
> > > > > > > scenario
> > > > > > > > that the schedule interval of DAG is 1 hour and the schedule
> > > > interval
> > > > > > of
> > > > > > > > TaskGroup is 20 min.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Nicholas
> > > > > > > >
> > > > > > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > bin.huangxb@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Nicholas,
> > > > > > > > >
> > > > > > > > > I am not sure about the old behavior of SubDagOperator,
> maybe
> > > it
> > > > > will
> > > > > > > > throw
> > > > > > > > > an error? But in the original proposal, the subdag's
> > > > > > schedule_interval
> > > > > > > > will
> > > > > > > > > be ignored. Or if we decide to use TaskGroup to replace
> > SubDag,
> > > > > there
> > > > > > > > will
> > > > > > > > > be no subdag schedule_interval.
> > > > > > > > >
> > > > > > > > > Bin
> > > > > > > > >
> > > > > > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > thanosxnicholas@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > > Thanks for your good proposal. I was confused whether the
> > > > > schedule
> > > > > > > > > > interval of SubDAG is different from that of the parent
> > DAG?
> > > I
> > > > > have
> > > > > > > > > > discussed with Jiajie Zhong about the schedule interval
> of
> > > > > SubDAG.
> > > > > > If
> > > > > > > > the
> > > > > > > > > > SubDagOperator has a different schedule interval, what
> will
> > > > > happen
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > scheduler to schedule the parent DAG?
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Nicholas Jiang
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > > bin.huangxb@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > > > > >
> > > > > > > > > > > I have rethought about the concept of subdag and task
> > > > groups. I
> > > > > > > think
> > > > > > > > > the
> > > > > > > > > > > better way to approach this is to entirely remove
> subdag
> > > and
> > > > > > > > introduce
> > > > > > > > > > the
> > > > > > > > > > > concept of TaskGroup, which is a container of tasks
> along
> > > > with
> > > > > > > their
> > > > > > > > > > > dependencies *without execution/scheduling logic as a
> > DAG*.
> > > > The
> > > > > > > only
> > > > > > > > > > > purpose of it is to group a list of tasks, but you
> still
> > > need
> > > > > to
> > > > > > > add
> > > > > > > > it
> > > > > > > > > > to
> > > > > > > > > > > a DAG for execution.
> > > > > > > > > > >
> > > > > > > > > > > Here is a small code snippet.
> > > > > > > > > > >
> > > > > > > > > > > ```
> > > > > > > > > > > class TaskGroup:
> > > > > > > > > > > """
> > > > > > > > > > > A TaskGroup contains a group of tasks.
> > > > > > > > > > >
> > > > > > > > > > > If default_args is missing, it will take default args
> > from
> > > > the
> > > > > > > > DAG.
> > > > > > > > > > > """
> > > > > > > > > > > def __init__(self, group_id, default_args):
> > > > > > > > > > > pass
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > """
> > > > > > > > > > > You can add tasks to a task group similar to adding
> tasks
> > > to
> > > > a
> > > > > > DAG
> > > > > > > > > > >
> > > > > > > > > > > This can be declared in a separate file from the dag
> file
> > > > > > > > > > > """
> > > > > > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > > > > > default_args=default_args)
> > > > > > > > > > > download_group.add_task(task1)
> > > > > > > > > > > task2.dag = download_group
> > > > > > > > > > >
> > > > > > > > > > > with download_group:
> > > > > > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > > > > > >
> > > > > > > > > > > [task, task2] >> task3
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > """Add it to a DAG for execution"""
> > > > > > > > > > > with DAG(dag_id='start_download_dag',
> > > > > default_args=default_args,
> > > > > > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > > > > > start = DummyOperator(task_id='start')
> > > > > > > > > > > start >> download_group
> > > > > > > > > > > # this is equivalent to
> > > > > > > > > > > # start >> [task, task2] >> task3
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > > With this, we can still reuse a group of tasks and set
> > > > > > dependencies
> > > > > > > > > > between
> > > > > > > > > > > them; it avoids the boilerplate code from using
> > > > SubDagOperator,
> > > > > > and
> > > > > > > > we
> > > > > > > > > > can
> > > > > > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > > > > > >
> > > > > > > > > > > User migration wise, we can introduce it before Airflow
> > 2.0
> > > > and
> > > > > > > allow
> > > > > > > > > > > gradual transition. Then we can decide if we still want
> > to
> > > > keep
> > > > > > the
> > > > > > > > > > > SubDagOperator or simply remove it.
> > > > > > > > > > >
> > > > > > > > > > > Any thoughts?
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Bin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1, proposal looks good.
> > > > > > > > > > > >
> > > > > > > > > > > > The original intention was really to have tasks
> groups
> > > and
> > > > a
> > > > > > > > > > zoom-in/out
> > > > > > > > > > > in
> > > > > > > > > > > > the UI. The original reasoning was to reuse the DAG
> > > object
> > > > > > since
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > group of tasks, but as highlighted here it does
> create
> > > > > > underlying
> > > > > > > > > > > > confusions since a DAG is much more than just a group
> > of
> > > > > tasks.
> > > > > > > > > > > >
> > > > > > > > > > > > Max
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > > > > > joshipoornima06@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thank you for your email.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This
> > > > rewrites
> > > > > > the
> > > > > > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > > > method to unpack subdag while parsing, and it
> > > will
> > > > > > give a
> > > > > > > > > > flat
> > > > > > > > > > > > > > > > structure at
> > > > > > > > > > > > > > > > the task level
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The serialized_dag representation already does
> > > this I
> > > > > > > think.
> > > > > > > > At
> > > > > > > > > > > least
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not sure about serialized_dag
> representation,
> > > but
> > > > at
> > > > > > > least
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > > still keep the subdag entry in the DAG table? In
> my
> > > > > > proposal
> > > > > > > as
> > > > > > > > > > also
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > draft PR, the idea is to *extract the tasks from
> > the
> > > > > subdag
> > > > > > > and
> > > > > > > > > add
> > > > > > > > > > > > them
> > > > > > > > > > > > > > back to the root_dag. *So the runtime DAG graph
> > will
> > > > look
> > > > > > > > exactly
> > > > > > > > > > the
> > > > > > > > > > > > > > same as without subdag but with metadata attached
> > to
> > > > > those
> > > > > > > > > > sections.
> > > > > > > > > > > > > These
> > > > > > > > > > > > > > metadata will be later on used to render in the
> UI.
> > > So
> > > > > > after
> > > > > > > > > > parsing
> > > > > > > > > > > (
> > > > > > > > > > > > > > *DagBag.process_file()*), it will just output the
> > > > > *root_dag
> > > > > > > > > > *instead
> > > > > > > > > > > of
> > > > > > > > > > > > > *root_dag +
> > > > > > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - e.g. section-1-* will have metadata
> > > > > > > > current_group=section-1,
> > > > > > > > > > > > > > parent_group=<the-root-dag-id> (welcome for
> naming
> > > > > > > > > suggestions),
> > > > > > > > > > > the
> > > > > > > > > > > > > > reason for parent_group is that we can have
> nested
> > > > group
> > > > > > and
> > > > > > > > > > still
> > > > > > > > > > > > be
> > > > > > > > > > > > > > able to capture the dependency.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Runtime DAG:
> > > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > While at the UI, what we see would be something
> > like
> > > > this
> > > > > > by
> > > > > > > > > > > utilizing
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > metadata, and then we can expand or zoom into in
> > some
> > > > > way.
> > > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > > > > > 1. We don't need to deal with the extra
> complexity
> > of
> > > > > > SubDag
> > > > > > > > for
> > > > > > > > > > > > > execution
> > > > > > > > > > > > > > and scheduling. It will be the same as not using
> > > > SubDag.
> > > > > > > > > > > > > > 2. Still have the benefits of modularized and
> > > reusable
> > > > > dag
> > > > > > > code
> > > > > > > > > and
> > > > > > > > > > > > > > declare dependencies between them. And with the
> new
> > > > > > > > > SubDagOperator
> > > > > > > > > > > (see
> > > > > > > > > > > > > AIP
> > > > > > > > > > > > > > or draft PR), we can use the same dag_factory
> > > function
> > > > > for
> > > > > > > > > > > generating 1
> > > > > > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag
> (in
> > > this
> > > > > > case,
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > extract all underlying tasks and append to the
> root
> > > > dag).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Then it gets to the idea of replacing subdag
> > with a
> > > > > > > > simpler
> > > > > > > > > > > > concept
> > > > > > > > > > > > > > by Ash: the proposed change basically drains out
> > the
> > > > > > > > contents
> > > > > > > > > > of
> > > > > > > > > > > a
> > > > > > > > > > > > > SubDag
> > > > > > > > > > > > > > and becomes more like
> > > > > > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > > > > > (forgive
> > > > > > > > > > > > > > me about the crazy name..). In this case, it is
> > still
> > > > > > > > > necessary
> > > > > > > > > > to
> > > > > > > > > > > > > keep the
> > > > > > > > > > > > > > concept of subdag as it is nothing more than a
> > name?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > That's why the TaskGroup idea comes up. Thanks
> > Chris
> > > > > Palmer
> > > > > > > for
> > > > > > > > > > > helping
> > > > > > > > > > > > > > conceptualize the functionality of TaskGroup, I
> > will
> > > > just
> > > > > > > paste
> > > > > > > > > it
> > > > > > > > > > > > here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > > > - You *can* have dependencies between Tasks in
> > the
> > > > same
> > > > > > > > > > TaskGroup,
> > > > > > > > > > > > but
> > > > > > > > > > > > > > > *cannot* have dependencies between a Task in a
> > > > > TaskGroup
> > > > > > > > and
> > > > > > > > > > > > either a
> > > > > > > > > > > > > > > Task in a different TaskGroup or a Task not in
> > any
> > > > > group
> > > > > > > > > > > > > > > - You *can* have dependencies between a
> TaskGroup
> > > and
> > > > > > > > either
> > > > > > > > > > > other
> > > > > > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > > > > > - The UI will by default render a TaskGroup as
> a
> > > > single
> > > > > > > > > > "object",
> > > > > > > > > > > > but
> > > > > > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > > > > > - You'd need some way to determine what the
> > > "status"
> > > > > of a
> > > > > > > > > > > TaskGroup
> > > > > > > > > > > > > was
> > > > > > > > > > > > > > > at least for UI display purposes
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I agree with Chris:
> > > > > > > > > > > > > > - From the backend's view (scheduler &
> executor), I
> > > > think
> > > > > > > > > TaskGroup
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > be ignored during execution. (unless we decide to
> > > > > implement
> > > > > > > > some
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > > operations that allows start/stop a group of
> tasks
> > > > etc.)
> > > > > > > > > > > > > > - From the UI's View, it should be able to pick
> up
> > > the
> > > > > > > > individual
> > > > > > > > > > > > tasks'
> > > > > > > > > > > > > > status and then determine the TaskGroup's status
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Bin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman
> <
> > > > > > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> I hadn’t thought about using the `>>` operator
> to
> > > tie
> > > > > dags
> > > > > > > > > > together
> > > > > > > > > > > > but
> > > > > > > > > > > > > I
> > > > > > > > > > > > > >> think that sounds pretty great! I wonder if we
> > could
> > > > > > > > essentially
> > > > > > > > > > > write
> > > > > > > > > > > > > in
> > > > > > > > > > > > > >> the ability to set dependencies to all
> > starter-tasks
> > > > for
> > > > > > > that
> > > > > > > > > DAG.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I’m personally ok with SubDag being a mostly UI
> > > > concept.
> > > > > > It
> > > > > > > > > > doesn’t
> > > > > > > > > > > > need
> > > > > > > > > > > > > >> to execute separately, you’re just adding more
> > tasks
> > > > to
> > > > > > the
> > > > > > > > > queue
> > > > > > > > > > > that
> > > > > > > > > > > > > will
> > > > > > > > > > > > > >> be executed when there are resources available.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> via Newton Mail [
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > > > > > >> ]
> > > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <
> > > > > > > > > chris@crpalmer.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >> I agree that SubDAGs are an overly complex
> > > > abstraction.
> > > > > I
> > > > > > > > think
> > > > > > > > > > what
> > > > > > > > > > > > is
> > > > > > > > > > > > > >> needed/useful is a TaskGroup concept. On a high
> > > level
> > > > I
> > > > > > > think
> > > > > > > > > you
> > > > > > > > > > > want
> > > > > > > > > > > > > >> this
> > > > > > > > > > > > > >> functionality:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > >> - You *can* have dependencies between Tasks in
> the
> > > > same
> > > > > > > > > TaskGroup,
> > > > > > > > > > > but
> > > > > > > > > > > > > >> *cannot* have dependencies between a Task in a
> > > > TaskGroup
> > > > > > and
> > > > > > > > > > either
> > > > > > > > > > > a
> > > > > > > > > > > > > >> Task in a different TaskGroup or a Task not in
> any
> > > > group
> > > > > > > > > > > > > >> - You *can* have dependencies between a
> TaskGroup
> > > and
> > > > > > either
> > > > > > > > > other
> > > > > > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > > > > > >> - The UI will by default render a TaskGroup as a
> > > > single
> > > > > > > > > "object",
> > > > > > > > > > > but
> > > > > > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > > > > > >> - You'd need some way to determine what the
> > "status"
> > > > of
> > > > > a
> > > > > > > > > > TaskGroup
> > > > > > > > > > > > was
> > > > > > > > > > > > > >> at least for UI display purposes
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Not sure if it would need to be a top level
> object
> > > > with
> > > > > > its
> > > > > > > > own
> > > > > > > > > > > > database
> > > > > > > > > > > > > >> table and model or just another attribute on
> > tasks.
> > > I
> > > > > > think
> > > > > > > > you
> > > > > > > > > > > could
> > > > > > > > > > > > > >> build
> > > > > > > > > > > > > >> it in a way such that from the schedulers point
> of
> > > > view
> > > > > a
> > > > > > > DAG
> > > > > > > > > with
> > > > > > > > > > > > > >> TaskGroups doesn't get treated any differently.
> So
> > > it
> > > > > > really
> > > > > > > > > just
> > > > > > > > > > > > > becomes
> > > > > > > > > > > > > >> a
> > > > > > > > > > > > > >> shortcut for setting dependencies between sets
> of
> > > > Tasks,
> > > > > > and
> > > > > > > > > > allows
> > > > > > > > > > > > the
> > > > > > > > > > > > > UI
> > > > > > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Chris
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Agree with James (and think it's actually the
> > more
> > > > > > > important
> > > > > > > > > > issue
> > > > > > > > > > > > to
> > > > > > > > > > > > > >> fix),
> > > > > > > > > > > > > >> > but I am still convinced Ash' idea is the
> right
> > > way
> > > > > > > forward
> > > > > > > > > > (just
> > > > > > > > > > > it
> > > > > > > > > > > > > >> might
> > > > > > > > > > > > > >> > require a bit more work to deprecate than
> adding
> > > > > visual
> > > > > > > > > grouping
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > >> > UI).
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > There was a previous thread about this FYI
> with
> > > more
> > > > > > > context
> > > > > > > > > on
> > > > > > > > > > > why
> > > > > > > > > > > > > >> subdags
> > > > > > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > > > > > >> >
> > > > > > > > > >
> > > > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > > > > > . A
> > > > > > > > > > > > > >> > solution I outline there to Jame's problem is
> > e.g.
> > > > > > > enabling
> > > > > > > > > the
> > > > > > > > > > >>
> > > > > > > > > > > > > >> operator
> > > > > > > > > > > > > >> > for Airflow operators to work with DAGs as
> > well. I
> > > > see
> > > > > > > this
> > > > > > > > > > being
> > > > > > > > > > > > > >> separate
> > > > > > > > > > > > > >> > from Ash' solution for DAG grouping in the UI
> > but
> > > > one
> > > > > of
> > > > > > > the
> > > > > > > > > two
> > > > > > > > > > > > items
> > > > > > > > > > > > > >> > required to replace all existing subdag
> > > > functionality.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > I've been working with subdags for 3 years and
> > > they
> > > > > are
> > > > > > > > > always a
> > > > > > > > > > > > giant
> > > > > > > > > > > > > >> pain
> > > > > > > > > > > > > >> > to use. They are a constant source of user
> > > confusion
> > > > > and
> > > > > > > > > > breakages
> > > > > > > > > > > > > >> during
> > > > > > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <
> > > > > > > > > > jcoder01@gmail.com>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > I'm not sure I totally agree it's just a UI
> > > > > concept. I
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > > subdag
> > > > > > > > > > > > > >> > > operator to simplify dependencies too. If
> you
> > > > have a
> > > > > > > group
> > > > > > > > > of
> > > > > > > > > > > > tasks
> > > > > > > > > > > > > >> that
> > > > > > > > > > > > > >> > > need to finish before another group of tasks
> > > > start,
> > > > > > > using
> > > > > > > > a
> > > > > > > > > > > subdag
> > > > > > > > > > > > > is
> > > > > > > > > > > > > >> a
> > > > > > > > > > > > > >> > > pretty quick way to set those dependencies
> > and I
> > > > > think
> > > > > > > > also
> > > > > > > > > > make
> > > > > > > > > > > > it
> > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin
> <
> > > > > > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > Berlin-Taylor
> > > > > <
> > > > > > > > > > > > ash@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > > Question:
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Do we even need the SubDagOperator
> > anymore?
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Would removing it entirely and just
> > > replacing
> > > > it
> > > > > > > with
> > > > > > > > a
> > > > > > > > > UI
> > > > > > > > > > > > > >> grouping
> > > > > > > > > > > > > >> > > > > concept be conceptually simpler, less to
> > get
> > > > > > wrong,
> > > > > > > > and
> > > > > > > > > > > closer
> > > > > > > > > > > > > to
> > > > > > > > > > > > > >> > what
> > > > > > > > > > > > > >> > > > > users actually want to achieve with
> > subdags?
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > With your proposed change, tasks in
> > subdags
> > > > > could
> > > > > > > > start
> > > > > > > > > > > > running
> > > > > > > > > > > > > in
> > > > > > > > > > > > > >> > > > > parallel (a good change) -- so should we
> > not
> > > > > also
> > > > > > > just
> > > > > > > > > > > > > _enitrely_
> > > > > > > > > > > > > >> > > remove
> > > > > > > > > > > > > >> > > > > the concept of a sub dag and replace it
> > with
> > > > > > > something
> > > > > > > > > > > > simpler.
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Problems with subdags (I think. I
> haven't
> > > used
> > > > > > them
> > > > > > > > > > > > extensively
> > > > > > > > > > > > > so
> > > > > > > > > > > > > >> > may
> > > > > > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > > > > > >> > > > > - They need their own dag_id, but it
> > has(?)
> > > to
> > > > > be
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > form
> > > > > > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > > > > > >> > > > > - They need their own schedule_interval,
> > but
> > > > it
> > > > > > has
> > > > > > > to
> > > > > > > > > > match
> > > > > > > > > > > > the
> > > > > > > > > > > > > >> > parent
> > > > > > > > > > > > > >> > > > dag
> > > > > > > > > > > > > >> > > > > - Sub dags can be paused on their own.
> > (Does
> > > > it
> > > > > > make
> > > > > > > > > sense
> > > > > > > > > > > to
> > > > > > > > > > > > do
> > > > > > > > > > > > > >> > this?
> > > > > > > > > > > > > >> > > > > Pausing just a sub dag would mean the
> sub
> > > dag
> > > > > > would
> > > > > > > > > never
> > > > > > > > > > > > > >> execute, so
> > > > > > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > > > > > >> > > > > - You had to choose the executor to
> > > operator a
> > > > > > > subdag
> > > > > > > > > with
> > > > > > > > > > > --
> > > > > > > > > > > > > >> always
> > > > > > > > > > > > > >> > a
> > > > > > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > -ash
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash
> > > > Berlin-Taylor <
> > > > > > > > > > > > ash@apache.org>
> > > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm
> > > excited
> > > > to
> > > > > > see
> > > > > > > > how
> > > > > > > > > > > this
> > > > > > > > > > > > > >> > > progresses.
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag
> parsing*:
> > > This
> > > > > > > > rewrites
> > > > > > > > > > the
> > > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > >> > > > > >> method to unpack subdag while
> parsing,
> > > and
> > > > it
> > > > > > > will
> > > > > > > > > > give a
> > > > > > > > > > > > > flat
> > > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > > The serialized_dag representation
> > already
> > > > does
> > > > > > > this
> > > > > > > > I
> > > > > > > > > > > think.
> > > > > > > > > > > > > At
> > > > > > > > > > > > > >> > least
> > > > > > > > > > > > > >> > > > if
> > > > > > > > > > > > > >> > > > > > I've understood your idea here
> > correctly.
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > > -ash
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin
> > Huang <
> > > > > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Sending a message to everyone and
> > collect
> > > > > > > feedback
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > > > > >> AIP-34
> > > > > > > > > > > > > >> > on
> > > > > > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was
> > > > previously
> > > > > > > > briefly
> > > > > > > > > > > > > >> mentioned in
> > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > >> > > > > >> discussion about what needs to be
> done
> > > for
> > > > > > > Airflow
> > > > > > > > > 2.0,
> > > > > > > > > > > and
> > > > > > > > > > > > > >> one of
> > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > >> > > > > >> ideas is to make SubDagOperator
> attach
> > > > tasks
> > > > > > back
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > root
> > > > > > > > > > > > > >> DAG.
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> This AIP-34 focuses on solving
> > > > SubDagOperator
> > > > > > > > related
> > > > > > > > > > > > issues
> > > > > > > > > > > > > by
> > > > > > > > > > > > > >> > > > > reattaching
> > > > > > > > > > > > > >> > > > > >> all tasks back to the root dag while
> > > > > respecting
> > > > > > > > > > > > dependencies
> > > > > > > > > > > > > >> > during
> > > > > > > > > > > > > >> > > > > >> parsing. The original grouping effect
> > on
> > > > the
> > > > > UI
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > > >> achieved
> > > > > > > > > > > > > >> > > > through
> > > > > > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> This also makes the dag_factory
> > function
> > > > more
> > > > > > > > > reusable
> > > > > > > > > > > > > because
> > > > > > > > > > > > > >> you
> > > > > > > > > > > > > >> > > > don't
> > > > > > > > > > > > > >> > > > > >> need to have parent_dag_name and
> > > > > child_dag_name
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > function
> > > > > > > > > > > > > >> > > > > signature
> > > > > > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag
> parsing*:
> > > This
> > > > > > > > rewrites
> > > > > > > > > > the
> > > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > >> > > > > >> method to unpack subdag while
> parsing,
> > > and
> > > > it
> > > > > > > will
> > > > > > > > > > give a
> > > > > > > > > > > > > flat
> > > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The new
> > > > > > > SubDagOperator
> > > > > > > > > > acts
> > > > > > > > > > > > > like a
> > > > > > > > > > > > > >> > > > > >> container and most of the original
> > > methods
> > > > > are
> > > > > > > > > removed.
> > > > > > > > > > > The
> > > > > > > > > > > > > >> > > > > >> signature is
> > > > > > > > > > > > > >> > > > > >> also changed to *subdag_factory *with
> > > > > > > *subdag_args
> > > > > > > > > *and
> > > > > > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > > > > > >> > > > > >> This is similar to the PythonOperator
> > > > > > signature.
> > > > > > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> > > > > current_group
> > > > > > &
> > > > > > > > > > > > parent_group
> > > > > > > > > > > > > >> > > > > attributes
> > > > > > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is
> used
> > > to
> > > > > > group
> > > > > > > > > tasks
> > > > > > > > > > > for
> > > > > > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > > > > > >> > > > > >> UI level. It may potentially extend
> > > further
> > > > > to
> > > > > > > > group
> > > > > > > > > > > > > arbitrary
> > > > > > > > > > > > > >> > > tasks
> > > > > > > > > > > > > >> > > > > >> outside the context of subdag to
> allow
> > > > > > > group-level
> > > > > > > > > > > > operations
> > > > > > > > > > > > > >> > > (i.e.
> > > > > > > > > > > > > >> > > > > >> stop/trigger a group of task within
> the
> > > > dag)
> > > > > > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*: Proposed
> > UI
> > > > > > > > modification
> > > > > > > > > > to
> > > > > > > > > > > > > allow
> > > > > > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a
> > flat
> > > > > > > structure
> > > > > > > > to
> > > > > > > > > > > pair
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >> > > the
> > > > > > > > > > > > > >> > > > > first
> > > > > > > > > > > > > >> > > > > >> change instead of the original
> > > hierarchical
> > > > > > > > > structure.
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Please see related documents and PRs
> > for
> > > > > > details:
> > > > > > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > > > > > >> > > > > >> Draft PR:
> > > > > > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Please let me know if there are any
> > > aspects
> > > > > > that
> > > > > > > > you
> > > > > > > > > > > > > >> > agree/disagree
> > > > > > > > > > > > > >> > > > > >> with or
> > > > > > > > > > > > > >> > > > > >> need more clarification (especially
> the
> > > > third
> > > > > > > > change
> > > > > > > > > > > > > regarding
> > > > > > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > > > > > >> > > > > >> Any comments are welcome and I am
> > looking
> > > > > > forward
> > > > > > > > to
> > > > > > > > > > it!
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > > > > > >> > > > > >> Bin
> > > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Thanks & Regards
> > > > > > > > > > > > > Poornima
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > <+48%20660%20796%20129>>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > <+48%20660%20796%20129>>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
> >
> > --
> >
> > *Jacob Ferriero*
> >
> > Strategic Cloud Engineer: Data Engineering
> >
> > jferriero@google.com
> >
> > 617-714-2509
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Hi, all, I've added the basic UI changes to my proposed implementation of
TaskGroup as UI grouping concept:
https://github.com/apache/airflow/pull/10153

I think Chris had a pretty good specification of TaskGroup so i'm quoting
it here. The only thing I don't fully agree with is the restriction
"... **cannot*
have dependencies between a Task in a TaskGroup and either a*
*   Task in a different TaskGroup or a Task not in any group*". I think
this is over restrictive. Since TaskGroup is a UI concept, tasks can have
dependencies on tasks in other TaskGroup or not in any TaskGroup. In my PR,
this is allowed. The graph edges will update accordingly when TaskGroups
are expanded/collapsed. TaskGroup is only helping to make the UI look less
crowded. Under the hood, everything is still a DAG of tasks and edges so
things work normally. Here's a screenshot
<https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif>
of the UI interaction.











*   - Tasks can be added to a TaskGroup   - You *can* have dependencies
between Tasks in the same TaskGroup, but   *cannot* have dependencies
between a Task in a TaskGroup and either a   Task in a different TaskGroup
or a Task not in any group   - You *can* have dependencies between a
TaskGroup and either other   TaskGroups or Tasks not in any group   - The
UI will by default render a TaskGroup as a single "object", but   which you
expand or zoom into in some way   - You'd need some way to determine what
the "status" of a TaskGroup was   at least for UI display purposes*


Regarding Jake's comment, I agree it's possible to implement the "retrying
tasks in a group" pattern he mentioned as an optional feature of TaskGroup
although that may go against having TaskGroup as a pure UI concept. For the
motivating example Jake provided, I suggest implementing both
SubmitLongRunningJobTask and PollJobStatusSensor in a single operator. It
can do something like BaseSensorOperator.execute() does in "reschedule"
mode, i.e. it first executes some code to submit the long running job to
the external service, and store the state (e.g. in XCom). Then reschedule
itself. Subsequent runs then pokes for the completion state.


On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero <jf...@google.com.invalid>
wrote:

> I really like this idea of a TaskGroup container as I think this will be
> much easier to use than SubDag.
>
> I'd like to propose an optional behavior for special retry mechanics via a
> TaskGroup.retry_all property.
> This way I could use TaskGroup to replace my favorite use of SubDag for
> atomically retrying tasks of the pattern "act on external state then
> reschedule poll until desired state reached".
>
> Motivating use case I have for a SubDag is very simple two task group
> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> I use SubDag is because it gives me an easy way to retry the SubmitJobTask
> if something about the PollJobSensor fails.
> This pattern would be really nice for jobs that are expected to run a long
> time (because we can use sensor can use reschedule mode freeing up slots)
> but might fail for a retryable reason.
> However, using SubDag to meet this use case defeats the purpose because
> SubDag infamously
> <
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> >
> blocks a "controller" slot for the entire duration.
> This may feel like a cyclic behavior but reality it is very common for a
> single operator to submit job / wait til done.
> We could use this case refactor many operators (e.g. BQ, Dataproc,
> Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with an
> optional reschedule mode if user knows that this job may take a long time.
>
> I'd be happy to the development work on adding this specific retry behavior
> to TaskGroup once the base concept is implemented if others in the
> community would find this a useful feature.
>
> Cheers,
> Jake
>
> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > All for it :) . I think we are getting closer to have regular planning
> and
> > making some structured approach to 2.0 and starting task force for it
> soon,
> > so I think this should be perfectly fine to discuss and even start
> > implementing what's beyond as soon as we make sure that we are
> prioritizing
> > 2.0 work.
> >
> > J,
> >
> >
> > On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com> wrote:
> >
> > > Hi Jarek,
> > >
> > > I agree we should not change the behaviour of the existing
> SubDagOperator
> > > till Airflow 2.1. Is it okay to continue the discussion about TaskGroup
> > as
> > > a brand new concept/feature independent from the existing
> SubDagOperator?
> > > In other words, shall we add TaskGroup as a UI grouping concept like
> Ash
> > > suggested, and not touch SubDagOperator atl all. Whenever we are ready
> > with
> > > TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> > >
> > > I really like Ash's idea of simplifying the SubDagOperator idea into a
> > > simple UI grouping concept. I think Xinbin's idea of "reattaching all
> the
> > > tasks to the root DAG" is the way to go. And I see James pointed out we
> > > need some helper functions to simplify dependencies setting of
> TaskGroup.
> > > Xinbin put up a pretty elegant example in his PR
> > > <https://github.com/apache/airflow/pull/9243>. I think having
> TaskGroup
> > as
> > > a UI concept should be a relatively small change. We can simplify
> > Xinbin's
> > > PR further. So I put up this alternative proposal here:
> > > https://github.com/apache/airflow/pull/10153
> > >
> > > I have not done any UI changes due to lack of experience with web UI.
> If
> > > anyone's interested, please take a look at the PR.
> > >
> > > Qian
> > >
> > > On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > Similar point here to the other ideas that are popping up. Maybe we
> > > should
> > > > just focus on completing 2.0 and make all discussions about further
> > > > improvements to 2.1? While those are important discussions (and we
> > should
> > > > continue them in the  near future !) I think at this point focusing
> on
> > > > delivering 2.0 in its current shape should be our focus now ?
> > > >
> > > > J.
> > > >
> > > > On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <bi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Daniel
> > > > >
> > > > > I agree that the TaskGroup should have the same API as a DAG object
> > > > related
> > > > > to task dependencies, but it will not have anything related to
> actual
> > > > > execution or scheduling.
> > > > > I will update the AIP according to this over the weekend.
> > > > >
> > > > > > We could even make a “DAGTemplate” object s.t. when you import
> the
> > > > object
> > > > > you can import it with parameters to determine the shape of the
> DAG.
> > > > >
> > > > > Can you elaborate a bit more on this? Does it serve a similar
> purpose
> > > as
> > > > a
> > > > > DAG factory function?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > daniel.imberman@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > Why not give the TaskGroup the same API as a DAG object (e.g. the
> > > > bitwise
> > > > > > operator fro task dependencies). We could even make a
> “DAGTemplate”
> > > > > object
> > > > > > s.t. when you import the object you can import it with parameters
> > to
> > > > > > determine the shape of the DAG.
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > bin.huangxb@gmail.com
> > > >
> > > > > > wrote:
> > > > > > The TaskGroup will not take schedule interval as a parameter
> > itself,
> > > > and
> > > > > it
> > > > > > depends on the DAG where it attaches to. In my opinion, the
> > TaskGroup
> > > > > will
> > > > > > only contain a group of tasks with interdependencies, and the
> > > TaskGroup
> > > > > > behaves like a task. It doesn't contain any execution/scheduling
> > > logic
> > > > > > (i.e. schedule_interval, concurrency, max_active_runs etc.) like
> a
> > > DAG
> > > > > > does.
> > > > > >
> > > > > > > For example, there is the scenario that the schedule interval
> of
> > > DAG
> > > > is
> > > > > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > > >
> > > > > > I am curious why you ask this. Is this a use case that you want
> to
> > > > > achieve?
> > > > > >
> > > > > > Bin
> > > > > >
> > > > > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <th...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Bin,
> > > > > > > Using TaskGroup, Is the schedule interval of TaskGroup the same
> > as
> > > > the
> > > > > > > parent DAG? My main concern is whether the schedule interval of
> > > > > TaskGroup
> > > > > > > could be different with that of the DAG? For example, there is
> > the
> > > > > > scenario
> > > > > > > that the schedule interval of DAG is 1 hour and the schedule
> > > interval
> > > > > of
> > > > > > > TaskGroup is 20 min.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Nicholas
> > > > > > >
> > > > > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > bin.huangxb@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Nicholas,
> > > > > > > >
> > > > > > > > I am not sure about the old behavior of SubDagOperator, maybe
> > it
> > > > will
> > > > > > > throw
> > > > > > > > an error? But in the original proposal, the subdag's
> > > > > schedule_interval
> > > > > > > will
> > > > > > > > be ignored. Or if we decide to use TaskGroup to replace
> SubDag,
> > > > there
> > > > > > > will
> > > > > > > > be no subdag schedule_interval.
> > > > > > > >
> > > > > > > > Bin
> > > > > > > >
> > > > > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> thanosxnicholas@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > > Thanks for your good proposal. I was confused whether the
> > > > schedule
> > > > > > > > > interval of SubDAG is different from that of the parent
> DAG?
> > I
> > > > have
> > > > > > > > > discussed with Jiajie Zhong about the schedule interval of
> > > > SubDAG.
> > > > > If
> > > > > > > the
> > > > > > > > > SubDagOperator has a different schedule interval, what will
> > > > happen
> > > > > > for
> > > > > > > > the
> > > > > > > > > scheduler to schedule the parent DAG?
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Nicholas Jiang
> > > > > > > > >
> > > > > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > bin.huangxb@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > > > >
> > > > > > > > > > I have rethought about the concept of subdag and task
> > > groups. I
> > > > > > think
> > > > > > > > the
> > > > > > > > > > better way to approach this is to entirely remove subdag
> > and
> > > > > > > introduce
> > > > > > > > > the
> > > > > > > > > > concept of TaskGroup, which is a container of tasks along
> > > with
> > > > > > their
> > > > > > > > > > dependencies *without execution/scheduling logic as a
> DAG*.
> > > The
> > > > > > only
> > > > > > > > > > purpose of it is to group a list of tasks, but you still
> > need
> > > > to
> > > > > > add
> > > > > > > it
> > > > > > > > > to
> > > > > > > > > > a DAG for execution.
> > > > > > > > > >
> > > > > > > > > > Here is a small code snippet.
> > > > > > > > > >
> > > > > > > > > > ```
> > > > > > > > > > class TaskGroup:
> > > > > > > > > > """
> > > > > > > > > > A TaskGroup contains a group of tasks.
> > > > > > > > > >
> > > > > > > > > > If default_args is missing, it will take default args
> from
> > > the
> > > > > > > DAG.
> > > > > > > > > > """
> > > > > > > > > > def __init__(self, group_id, default_args):
> > > > > > > > > > pass
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > """
> > > > > > > > > > You can add tasks to a task group similar to adding tasks
> > to
> > > a
> > > > > DAG
> > > > > > > > > >
> > > > > > > > > > This can be declared in a separate file from the dag file
> > > > > > > > > > """
> > > > > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > > > > default_args=default_args)
> > > > > > > > > > download_group.add_task(task1)
> > > > > > > > > > task2.dag = download_group
> > > > > > > > > >
> > > > > > > > > > with download_group:
> > > > > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > > > > >
> > > > > > > > > > [task, task2] >> task3
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > """Add it to a DAG for execution"""
> > > > > > > > > > with DAG(dag_id='start_download_dag',
> > > > default_args=default_args,
> > > > > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > > > > start = DummyOperator(task_id='start')
> > > > > > > > > > start >> download_group
> > > > > > > > > > # this is equivalent to
> > > > > > > > > > # start >> [task, task2] >> task3
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > With this, we can still reuse a group of tasks and set
> > > > > dependencies
> > > > > > > > > between
> > > > > > > > > > them; it avoids the boilerplate code from using
> > > SubDagOperator,
> > > > > and
> > > > > > > we
> > > > > > > > > can
> > > > > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > > > > >
> > > > > > > > > > User migration wise, we can introduce it before Airflow
> 2.0
> > > and
> > > > > > allow
> > > > > > > > > > gradual transition. Then we can decide if we still want
> to
> > > keep
> > > > > the
> > > > > > > > > > SubDagOperator or simply remove it.
> > > > > > > > > >
> > > > > > > > > > Any thoughts?
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Bin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > +1, proposal looks good.
> > > > > > > > > > >
> > > > > > > > > > > The original intention was really to have tasks groups
> > and
> > > a
> > > > > > > > > zoom-in/out
> > > > > > > > > > in
> > > > > > > > > > > the UI. The original reasoning was to reuse the DAG
> > object
> > > > > since
> > > > > > it
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > group of tasks, but as highlighted here it does create
> > > > > underlying
> > > > > > > > > > > confusions since a DAG is much more than just a group
> of
> > > > tasks.
> > > > > > > > > > >
> > > > > > > > > > > Max
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > > > > joshipoornima06@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thank you for your email.
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This
> > > rewrites
> > > > > the
> > > > > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > > method to unpack subdag while parsing, and it
> > will
> > > > > give a
> > > > > > > > > flat
> > > > > > > > > > > > > > > structure at
> > > > > > > > > > > > > > > the task level
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The serialized_dag representation already does
> > this I
> > > > > > think.
> > > > > > > At
> > > > > > > > > > least
> > > > > > > > > > > > if
> > > > > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure about serialized_dag representation,
> > but
> > > at
> > > > > > least
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > still keep the subdag entry in the DAG table? In my
> > > > > proposal
> > > > > > as
> > > > > > > > > also
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > > draft PR, the idea is to *extract the tasks from
> the
> > > > subdag
> > > > > > and
> > > > > > > > add
> > > > > > > > > > > them
> > > > > > > > > > > > > back to the root_dag. *So the runtime DAG graph
> will
> > > look
> > > > > > > exactly
> > > > > > > > > the
> > > > > > > > > > > > > same as without subdag but with metadata attached
> to
> > > > those
> > > > > > > > > sections.
> > > > > > > > > > > > These
> > > > > > > > > > > > > metadata will be later on used to render in the UI.
> > So
> > > > > after
> > > > > > > > > parsing
> > > > > > > > > > (
> > > > > > > > > > > > > *DagBag.process_file()*), it will just output the
> > > > *root_dag
> > > > > > > > > *instead
> > > > > > > > > > of
> > > > > > > > > > > > *root_dag +
> > > > > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > > > > >
> > > > > > > > > > > > > - e.g. section-1-* will have metadata
> > > > > > > current_group=section-1,
> > > > > > > > > > > > > parent_group=<the-root-dag-id> (welcome for naming
> > > > > > > > suggestions),
> > > > > > > > > > the
> > > > > > > > > > > > > reason for parent_group is that we can have nested
> > > group
> > > > > and
> > > > > > > > > still
> > > > > > > > > > > be
> > > > > > > > > > > > > able to capture the dependency.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Runtime DAG:
> > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > >
> > > > > > > > > > > > > While at the UI, what we see would be something
> like
> > > this
> > > > > by
> > > > > > > > > > utilizing
> > > > > > > > > > > > the
> > > > > > > > > > > > > metadata, and then we can expand or zoom into in
> some
> > > > way.
> > > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > > >
> > > > > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > > > > 1. We don't need to deal with the extra complexity
> of
> > > > > SubDag
> > > > > > > for
> > > > > > > > > > > > execution
> > > > > > > > > > > > > and scheduling. It will be the same as not using
> > > SubDag.
> > > > > > > > > > > > > 2. Still have the benefits of modularized and
> > reusable
> > > > dag
> > > > > > code
> > > > > > > > and
> > > > > > > > > > > > > declare dependencies between them. And with the new
> > > > > > > > SubDagOperator
> > > > > > > > > > (see
> > > > > > > > > > > > AIP
> > > > > > > > > > > > > or draft PR), we can use the same dag_factory
> > function
> > > > for
> > > > > > > > > > generating 1
> > > > > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag (in
> > this
> > > > > case,
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > just
> > > > > > > > > > > > > extract all underlying tasks and append to the root
> > > dag).
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Then it gets to the idea of replacing subdag
> with a
> > > > > > > simpler
> > > > > > > > > > > concept
> > > > > > > > > > > > > by Ash: the proposed change basically drains out
> the
> > > > > > > contents
> > > > > > > > > of
> > > > > > > > > > a
> > > > > > > > > > > > SubDag
> > > > > > > > > > > > > and becomes more like
> > > > > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > > > > (forgive
> > > > > > > > > > > > > me about the crazy name..). In this case, it is
> still
> > > > > > > > necessary
> > > > > > > > > to
> > > > > > > > > > > > keep the
> > > > > > > > > > > > > concept of subdag as it is nothing more than a
> name?
> > > > > > > > > > > > >
> > > > > > > > > > > > > That's why the TaskGroup idea comes up. Thanks
> Chris
> > > > Palmer
> > > > > > for
> > > > > > > > > > helping
> > > > > > > > > > > > > conceptualize the functionality of TaskGroup, I
> will
> > > just
> > > > > > paste
> > > > > > > > it
> > > > > > > > > > > here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > > - You *can* have dependencies between Tasks in
> the
> > > same
> > > > > > > > > TaskGroup,
> > > > > > > > > > > but
> > > > > > > > > > > > > > *cannot* have dependencies between a Task in a
> > > > TaskGroup
> > > > > > > and
> > > > > > > > > > > either a
> > > > > > > > > > > > > > Task in a different TaskGroup or a Task not in
> any
> > > > group
> > > > > > > > > > > > > > - You *can* have dependencies between a TaskGroup
> > and
> > > > > > > either
> > > > > > > > > > other
> > > > > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > > > > - The UI will by default render a TaskGroup as a
> > > single
> > > > > > > > > "object",
> > > > > > > > > > > but
> > > > > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > > > > - You'd need some way to determine what the
> > "status"
> > > > of a
> > > > > > > > > > TaskGroup
> > > > > > > > > > > > was
> > > > > > > > > > > > > > at least for UI display purposes
> > > > > > > > > > > > >
> > > > > > > > > > > > > I agree with Chris:
> > > > > > > > > > > > > - From the backend's view (scheduler & executor), I
> > > think
> > > > > > > > TaskGroup
> > > > > > > > > > > > should
> > > > > > > > > > > > > be ignored during execution. (unless we decide to
> > > > implement
> > > > > > > some
> > > > > > > > > > > metadata
> > > > > > > > > > > > > operations that allows start/stop a group of tasks
> > > etc.)
> > > > > > > > > > > > > - From the UI's View, it should be able to pick up
> > the
> > > > > > > individual
> > > > > > > > > > > tasks'
> > > > > > > > > > > > > status and then determine the TaskGroup's status
> > > > > > > > > > > > >
> > > > > > > > > > > > > Bin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman <
> > > > > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> I hadn’t thought about using the `>>` operator to
> > tie
> > > > dags
> > > > > > > > > together
> > > > > > > > > > > but
> > > > > > > > > > > > I
> > > > > > > > > > > > >> think that sounds pretty great! I wonder if we
> could
> > > > > > > essentially
> > > > > > > > > > write
> > > > > > > > > > > > in
> > > > > > > > > > > > >> the ability to set dependencies to all
> starter-tasks
> > > for
> > > > > > that
> > > > > > > > DAG.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I’m personally ok with SubDag being a mostly UI
> > > concept.
> > > > > It
> > > > > > > > > doesn’t
> > > > > > > > > > > need
> > > > > > > > > > > > >> to execute separately, you’re just adding more
> tasks
> > > to
> > > > > the
> > > > > > > > queue
> > > > > > > > > > that
> > > > > > > > > > > > will
> > > > > > > > > > > > >> be executed when there are resources available.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> via Newton Mail [
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > > > > >> ]
> > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <
> > > > > > > > chris@crpalmer.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >> I agree that SubDAGs are an overly complex
> > > abstraction.
> > > > I
> > > > > > > think
> > > > > > > > > what
> > > > > > > > > > > is
> > > > > > > > > > > > >> needed/useful is a TaskGroup concept. On a high
> > level
> > > I
> > > > > > think
> > > > > > > > you
> > > > > > > > > > want
> > > > > > > > > > > > >> this
> > > > > > > > > > > > >> functionality:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > > > > >> - You *can* have dependencies between Tasks in the
> > > same
> > > > > > > > TaskGroup,
> > > > > > > > > > but
> > > > > > > > > > > > >> *cannot* have dependencies between a Task in a
> > > TaskGroup
> > > > > and
> > > > > > > > > either
> > > > > > > > > > a
> > > > > > > > > > > > >> Task in a different TaskGroup or a Task not in any
> > > group
> > > > > > > > > > > > >> - You *can* have dependencies between a TaskGroup
> > and
> > > > > either
> > > > > > > > other
> > > > > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > > > > >> - The UI will by default render a TaskGroup as a
> > > single
> > > > > > > > "object",
> > > > > > > > > > but
> > > > > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > > > > >> - You'd need some way to determine what the
> "status"
> > > of
> > > > a
> > > > > > > > > TaskGroup
> > > > > > > > > > > was
> > > > > > > > > > > > >> at least for UI display purposes
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Not sure if it would need to be a top level object
> > > with
> > > > > its
> > > > > > > own
> > > > > > > > > > > database
> > > > > > > > > > > > >> table and model or just another attribute on
> tasks.
> > I
> > > > > think
> > > > > > > you
> > > > > > > > > > could
> > > > > > > > > > > > >> build
> > > > > > > > > > > > >> it in a way such that from the schedulers point of
> > > view
> > > > a
> > > > > > DAG
> > > > > > > > with
> > > > > > > > > > > > >> TaskGroups doesn't get treated any differently. So
> > it
> > > > > really
> > > > > > > > just
> > > > > > > > > > > > becomes
> > > > > > > > > > > > >> a
> > > > > > > > > > > > >> shortcut for setting dependencies between sets of
> > > Tasks,
> > > > > and
> > > > > > > > > allows
> > > > > > > > > > > the
> > > > > > > > > > > > UI
> > > > > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Chris
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Agree with James (and think it's actually the
> more
> > > > > > important
> > > > > > > > > issue
> > > > > > > > > > > to
> > > > > > > > > > > > >> fix),
> > > > > > > > > > > > >> > but I am still convinced Ash' idea is the right
> > way
> > > > > > forward
> > > > > > > > > (just
> > > > > > > > > > it
> > > > > > > > > > > > >> might
> > > > > > > > > > > > >> > require a bit more work to deprecate than adding
> > > > visual
> > > > > > > > grouping
> > > > > > > > > > in
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > UI).
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > There was a previous thread about this FYI with
> > more
> > > > > > context
> > > > > > > > on
> > > > > > > > > > why
> > > > > > > > > > > > >> subdags
> > > > > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > > > > >> >
> > > > > > > > >
> > > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > > > > . A
> > > > > > > > > > > > >> > solution I outline there to Jame's problem is
> e.g.
> > > > > > enabling
> > > > > > > > the
> > > > > > > > > >>
> > > > > > > > > > > > >> operator
> > > > > > > > > > > > >> > for Airflow operators to work with DAGs as
> well. I
> > > see
> > > > > > this
> > > > > > > > > being
> > > > > > > > > > > > >> separate
> > > > > > > > > > > > >> > from Ash' solution for DAG grouping in the UI
> but
> > > one
> > > > of
> > > > > > the
> > > > > > > > two
> > > > > > > > > > > items
> > > > > > > > > > > > >> > required to replace all existing subdag
> > > functionality.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > I've been working with subdags for 3 years and
> > they
> > > > are
> > > > > > > > always a
> > > > > > > > > > > giant
> > > > > > > > > > > > >> pain
> > > > > > > > > > > > >> > to use. They are a constant source of user
> > confusion
> > > > and
> > > > > > > > > breakages
> > > > > > > > > > > > >> during
> > > > > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <
> > > > > > > > > jcoder01@gmail.com>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > I'm not sure I totally agree it's just a UI
> > > > concept. I
> > > > > > use
> > > > > > > > the
> > > > > > > > > > > > subdag
> > > > > > > > > > > > >> > > operator to simplify dependencies too. If you
> > > have a
> > > > > > group
> > > > > > > > of
> > > > > > > > > > > tasks
> > > > > > > > > > > > >> that
> > > > > > > > > > > > >> > > need to finish before another group of tasks
> > > start,
> > > > > > using
> > > > > > > a
> > > > > > > > > > subdag
> > > > > > > > > > > > is
> > > > > > > > > > > > >> a
> > > > > > > > > > > > >> > > pretty quick way to set those dependencies
> and I
> > > > think
> > > > > > > also
> > > > > > > > > make
> > > > > > > > > > > it
> > > > > > > > > > > > >> > easier
> > > > > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <
> > > > > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > Berlin-Taylor
> > > > <
> > > > > > > > > > > ash@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > Question:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Do we even need the SubDagOperator
> anymore?
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Would removing it entirely and just
> > replacing
> > > it
> > > > > > with
> > > > > > > a
> > > > > > > > UI
> > > > > > > > > > > > >> grouping
> > > > > > > > > > > > >> > > > > concept be conceptually simpler, less to
> get
> > > > > wrong,
> > > > > > > and
> > > > > > > > > > closer
> > > > > > > > > > > > to
> > > > > > > > > > > > >> > what
> > > > > > > > > > > > >> > > > > users actually want to achieve with
> subdags?
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > With your proposed change, tasks in
> subdags
> > > > could
> > > > > > > start
> > > > > > > > > > > running
> > > > > > > > > > > > in
> > > > > > > > > > > > >> > > > > parallel (a good change) -- so should we
> not
> > > > also
> > > > > > just
> > > > > > > > > > > > _enitrely_
> > > > > > > > > > > > >> > > remove
> > > > > > > > > > > > >> > > > > the concept of a sub dag and replace it
> with
> > > > > > something
> > > > > > > > > > > simpler.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Problems with subdags (I think. I haven't
> > used
> > > > > them
> > > > > > > > > > > extensively
> > > > > > > > > > > > so
> > > > > > > > > > > > >> > may
> > > > > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > > > > >> > > > > - They need their own dag_id, but it
> has(?)
> > to
> > > > be
> > > > > of
> > > > > > > the
> > > > > > > > > > form
> > > > > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > > > > >> > > > > - They need their own schedule_interval,
> but
> > > it
> > > > > has
> > > > > > to
> > > > > > > > > match
> > > > > > > > > > > the
> > > > > > > > > > > > >> > parent
> > > > > > > > > > > > >> > > > dag
> > > > > > > > > > > > >> > > > > - Sub dags can be paused on their own.
> (Does
> > > it
> > > > > make
> > > > > > > > sense
> > > > > > > > > > to
> > > > > > > > > > > do
> > > > > > > > > > > > >> > this?
> > > > > > > > > > > > >> > > > > Pausing just a sub dag would mean the sub
> > dag
> > > > > would
> > > > > > > > never
> > > > > > > > > > > > >> execute, so
> > > > > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > > > > >> > > > > - You had to choose the executor to
> > operator a
> > > > > > subdag
> > > > > > > > with
> > > > > > > > > > --
> > > > > > > > > > > > >> always
> > > > > > > > > > > > >> > a
> > > > > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > -ash
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash
> > > Berlin-Taylor <
> > > > > > > > > > > ash@apache.org>
> > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm
> > excited
> > > to
> > > > > see
> > > > > > > how
> > > > > > > > > > this
> > > > > > > > > > > > >> > > progresses.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*:
> > This
> > > > > > > rewrites
> > > > > > > > > the
> > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > >> > > > > >> method to unpack subdag while parsing,
> > and
> > > it
> > > > > > will
> > > > > > > > > give a
> > > > > > > > > > > > flat
> > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > The serialized_dag representation
> already
> > > does
> > > > > > this
> > > > > > > I
> > > > > > > > > > think.
> > > > > > > > > > > > At
> > > > > > > > > > > > >> > least
> > > > > > > > > > > > >> > > > if
> > > > > > > > > > > > >> > > > > > I've understood your idea here
> correctly.
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > -ash
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin
> Huang <
> > > > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > > >> > > > > >
> > > > > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Sending a message to everyone and
> collect
> > > > > > feedback
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > > > > >> AIP-34
> > > > > > > > > > > > >> > on
> > > > > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was
> > > previously
> > > > > > > briefly
> > > > > > > > > > > > >> mentioned in
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > > >> discussion about what needs to be done
> > for
> > > > > > Airflow
> > > > > > > > 2.0,
> > > > > > > > > > and
> > > > > > > > > > > > >> one of
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > > >> ideas is to make SubDagOperator attach
> > > tasks
> > > > > back
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > root
> > > > > > > > > > > > >> DAG.
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> This AIP-34 focuses on solving
> > > SubDagOperator
> > > > > > > related
> > > > > > > > > > > issues
> > > > > > > > > > > > by
> > > > > > > > > > > > >> > > > > reattaching
> > > > > > > > > > > > >> > > > > >> all tasks back to the root dag while
> > > > respecting
> > > > > > > > > > > dependencies
> > > > > > > > > > > > >> > during
> > > > > > > > > > > > >> > > > > >> parsing. The original grouping effect
> on
> > > the
> > > > UI
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > > > >> achieved
> > > > > > > > > > > > >> > > > through
> > > > > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> This also makes the dag_factory
> function
> > > more
> > > > > > > > reusable
> > > > > > > > > > > > because
> > > > > > > > > > > > >> you
> > > > > > > > > > > > >> > > > don't
> > > > > > > > > > > > >> > > > > >> need to have parent_dag_name and
> > > > child_dag_name
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > function
> > > > > > > > > > > > >> > > > > signature
> > > > > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*:
> > This
> > > > > > > rewrites
> > > > > > > > > the
> > > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > > >> > > > > >> method to unpack subdag while parsing,
> > and
> > > it
> > > > > > will
> > > > > > > > > give a
> > > > > > > > > > > > flat
> > > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The new
> > > > > > SubDagOperator
> > > > > > > > > acts
> > > > > > > > > > > > like a
> > > > > > > > > > > > >> > > > > >> container and most of the original
> > methods
> > > > are
> > > > > > > > removed.
> > > > > > > > > > The
> > > > > > > > > > > > >> > > > > >> signature is
> > > > > > > > > > > > >> > > > > >> also changed to *subdag_factory *with
> > > > > > *subdag_args
> > > > > > > > *and
> > > > > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > > > > >> > > > > >> This is similar to the PythonOperator
> > > > > signature.
> > > > > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> > > > current_group
> > > > > &
> > > > > > > > > > > parent_group
> > > > > > > > > > > > >> > > > > attributes
> > > > > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is used
> > to
> > > > > group
> > > > > > > > tasks
> > > > > > > > > > for
> > > > > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > > > > >> > > > > >> UI level. It may potentially extend
> > further
> > > > to
> > > > > > > group
> > > > > > > > > > > > arbitrary
> > > > > > > > > > > > >> > > tasks
> > > > > > > > > > > > >> > > > > >> outside the context of subdag to allow
> > > > > > group-level
> > > > > > > > > > > operations
> > > > > > > > > > > > >> > > (i.e.
> > > > > > > > > > > > >> > > > > >> stop/trigger a group of task within the
> > > dag)
> > > > > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*: Proposed
> UI
> > > > > > > modification
> > > > > > > > > to
> > > > > > > > > > > > allow
> > > > > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a
> flat
> > > > > > structure
> > > > > > > to
> > > > > > > > > > pair
> > > > > > > > > > > > with
> > > > > > > > > > > > >> > > the
> > > > > > > > > > > > >> > > > > first
> > > > > > > > > > > > >> > > > > >> change instead of the original
> > hierarchical
> > > > > > > > structure.
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Please see related documents and PRs
> for
> > > > > details:
> > > > > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > > > > >> > > > > >> Draft PR:
> > > > > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Please let me know if there are any
> > aspects
> > > > > that
> > > > > > > you
> > > > > > > > > > > > >> > agree/disagree
> > > > > > > > > > > > >> > > > > >> with or
> > > > > > > > > > > > >> > > > > >> need more clarification (especially the
> > > third
> > > > > > > change
> > > > > > > > > > > > regarding
> > > > > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > > > > >> > > > > >> Any comments are welcome and I am
> looking
> > > > > forward
> > > > > > > to
> > > > > > > > > it!
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > > > > >> > > > > >> Bin
> > > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Thanks & Regards
> > > > > > > > > > > > Poornima
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > <+48%20660%20796%20129>>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > <+48%20660%20796%20129>>
> > [image: Polidea] <https://www.polidea.com/>
> >
>
>
> --
>
> *Jacob Ferriero*
>
> Strategic Cloud Engineer: Data Engineering
>
> jferriero@google.com
>
> 617-714-2509
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Jacob Ferriero <jf...@google.com.INVALID>.
I really like this idea of a TaskGroup container as I think this will be
much easier to use than SubDag.

I'd like to propose an optional behavior for special retry mechanics via a
TaskGroup.retry_all property.
This way I could use TaskGroup to replace my favorite use of SubDag for
atomically retrying tasks of the pattern "act on external state then
reschedule poll until desired state reached".

Motivating use case I have for a SubDag is very simple two task group
[SubmitLongRunningJobTask >> PollJobStatusSensor].
I use SubDag is because it gives me an easy way to retry the SubmitJobTask
if something about the PollJobSensor fails.
This pattern would be really nice for jobs that are expected to run a long
time (because we can use sensor can use reschedule mode freeing up slots)
but might fail for a retryable reason.
However, using SubDag to meet this use case defeats the purpose because
SubDag infamously
<https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10>
blocks a "controller" slot for the entire duration.
This may feel like a cyclic behavior but reality it is very common for a
single operator to submit job / wait til done.
We could use this case refactor many operators (e.g. BQ, Dataproc,
Dataflow) to be implemented as TaskGroup[SubmitTask >> PollTask] with an
optional reschedule mode if user knows that this job may take a long time.

I'd be happy to the development work on adding this specific retry behavior
to TaskGroup once the base concept is implemented if others in the
community would find this a useful feature.

Cheers,
Jake

On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> All for it :) . I think we are getting closer to have regular planning and
> making some structured approach to 2.0 and starting task force for it soon,
> so I think this should be perfectly fine to discuss and even start
> implementing what's beyond as soon as we make sure that we are prioritizing
> 2.0 work.
>
> J,
>
>
> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com> wrote:
>
> > Hi Jarek,
> >
> > I agree we should not change the behaviour of the existing SubDagOperator
> > till Airflow 2.1. Is it okay to continue the discussion about TaskGroup
> as
> > a brand new concept/feature independent from the existing SubDagOperator?
> > In other words, shall we add TaskGroup as a UI grouping concept like Ash
> > suggested, and not touch SubDagOperator atl all. Whenever we are ready
> with
> > TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
> >
> > I really like Ash's idea of simplifying the SubDagOperator idea into a
> > simple UI grouping concept. I think Xinbin's idea of "reattaching all the
> > tasks to the root DAG" is the way to go. And I see James pointed out we
> > need some helper functions to simplify dependencies setting of TaskGroup.
> > Xinbin put up a pretty elegant example in his PR
> > <https://github.com/apache/airflow/pull/9243>. I think having TaskGroup
> as
> > a UI concept should be a relatively small change. We can simplify
> Xinbin's
> > PR further. So I put up this alternative proposal here:
> > https://github.com/apache/airflow/pull/10153
> >
> > I have not done any UI changes due to lack of experience with web UI. If
> > anyone's interested, please take a look at the PR.
> >
> > Qian
> >
> > On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > Similar point here to the other ideas that are popping up. Maybe we
> > should
> > > just focus on completing 2.0 and make all discussions about further
> > > improvements to 2.1? While those are important discussions (and we
> should
> > > continue them in the  near future !) I think at this point focusing on
> > > delivering 2.0 in its current shape should be our focus now ?
> > >
> > > J.
> > >
> > > On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <bi...@gmail.com>
> > > wrote:
> > >
> > > > Hi Daniel
> > > >
> > > > I agree that the TaskGroup should have the same API as a DAG object
> > > related
> > > > to task dependencies, but it will not have anything related to actual
> > > > execution or scheduling.
> > > > I will update the AIP according to this over the weekend.
> > > >
> > > > > We could even make a “DAGTemplate” object s.t. when you import the
> > > object
> > > > you can import it with parameters to determine the shape of the DAG.
> > > >
> > > > Can you elaborate a bit more on this? Does it serve a similar purpose
> > as
> > > a
> > > > DAG factory function?
> > > >
> > > >
> > > >
> > > > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > daniel.imberman@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Bin,
> > > > >
> > > > > Why not give the TaskGroup the same API as a DAG object (e.g. the
> > > bitwise
> > > > > operator fro task dependencies). We could even make a “DAGTemplate”
> > > > object
> > > > > s.t. when you import the object you can import it with parameters
> to
> > > > > determine the shape of the DAG.
> > > > >
> > > > >
> > > > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> bin.huangxb@gmail.com
> > >
> > > > > wrote:
> > > > > The TaskGroup will not take schedule interval as a parameter
> itself,
> > > and
> > > > it
> > > > > depends on the DAG where it attaches to. In my opinion, the
> TaskGroup
> > > > will
> > > > > only contain a group of tasks with interdependencies, and the
> > TaskGroup
> > > > > behaves like a task. It doesn't contain any execution/scheduling
> > logic
> > > > > (i.e. schedule_interval, concurrency, max_active_runs etc.) like a
> > DAG
> > > > > does.
> > > > >
> > > > > > For example, there is the scenario that the schedule interval of
> > DAG
> > > is
> > > > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > > > >
> > > > > I am curious why you ask this. Is this a use case that you want to
> > > > achieve?
> > > > >
> > > > > Bin
> > > > >
> > > > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <th...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Bin,
> > > > > > Using TaskGroup, Is the schedule interval of TaskGroup the same
> as
> > > the
> > > > > > parent DAG? My main concern is whether the schedule interval of
> > > > TaskGroup
> > > > > > could be different with that of the DAG? For example, there is
> the
> > > > > scenario
> > > > > > that the schedule interval of DAG is 1 hour and the schedule
> > interval
> > > > of
> > > > > > TaskGroup is 20 min.
> > > > > >
> > > > > > Cheers,
> > > > > > Nicholas
> > > > > >
> > > > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > bin.huangxb@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Nicholas,
> > > > > > >
> > > > > > > I am not sure about the old behavior of SubDagOperator, maybe
> it
> > > will
> > > > > > throw
> > > > > > > an error? But in the original proposal, the subdag's
> > > > schedule_interval
> > > > > > will
> > > > > > > be ignored. Or if we decide to use TaskGroup to replace SubDag,
> > > there
> > > > > > will
> > > > > > > be no subdag schedule_interval.
> > > > > > >
> > > > > > > Bin
> > > > > > >
> > > > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <thanosxnicholas@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > > Thanks for your good proposal. I was confused whether the
> > > schedule
> > > > > > > > interval of SubDAG is different from that of the parent DAG?
> I
> > > have
> > > > > > > > discussed with Jiajie Zhong about the schedule interval of
> > > SubDAG.
> > > > If
> > > > > > the
> > > > > > > > SubDagOperator has a different schedule interval, what will
> > > happen
> > > > > for
> > > > > > > the
> > > > > > > > scheduler to schedule the parent DAG?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Nicholas Jiang
> > > > > > > >
> > > > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > bin.huangxb@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > > >
> > > > > > > > > I have rethought about the concept of subdag and task
> > groups. I
> > > > > think
> > > > > > > the
> > > > > > > > > better way to approach this is to entirely remove subdag
> and
> > > > > > introduce
> > > > > > > > the
> > > > > > > > > concept of TaskGroup, which is a container of tasks along
> > with
> > > > > their
> > > > > > > > > dependencies *without execution/scheduling logic as a DAG*.
> > The
> > > > > only
> > > > > > > > > purpose of it is to group a list of tasks, but you still
> need
> > > to
> > > > > add
> > > > > > it
> > > > > > > > to
> > > > > > > > > a DAG for execution.
> > > > > > > > >
> > > > > > > > > Here is a small code snippet.
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > class TaskGroup:
> > > > > > > > > """
> > > > > > > > > A TaskGroup contains a group of tasks.
> > > > > > > > >
> > > > > > > > > If default_args is missing, it will take default args from
> > the
> > > > > > DAG.
> > > > > > > > > """
> > > > > > > > > def __init__(self, group_id, default_args):
> > > > > > > > > pass
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > """
> > > > > > > > > You can add tasks to a task group similar to adding tasks
> to
> > a
> > > > DAG
> > > > > > > > >
> > > > > > > > > This can be declared in a separate file from the dag file
> > > > > > > > > """
> > > > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > > > default_args=default_args)
> > > > > > > > > download_group.add_task(task1)
> > > > > > > > > task2.dag = download_group
> > > > > > > > >
> > > > > > > > > with download_group:
> > > > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > > > >
> > > > > > > > > [task, task2] >> task3
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > """Add it to a DAG for execution"""
> > > > > > > > > with DAG(dag_id='start_download_dag',
> > > default_args=default_args,
> > > > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > > > start = DummyOperator(task_id='start')
> > > > > > > > > start >> download_group
> > > > > > > > > # this is equivalent to
> > > > > > > > > # start >> [task, task2] >> task3
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > With this, we can still reuse a group of tasks and set
> > > > dependencies
> > > > > > > > between
> > > > > > > > > them; it avoids the boilerplate code from using
> > SubDagOperator,
> > > > and
> > > > > > we
> > > > > > > > can
> > > > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > > > >
> > > > > > > > > User migration wise, we can introduce it before Airflow 2.0
> > and
> > > > > allow
> > > > > > > > > gradual transition. Then we can decide if we still want to
> > keep
> > > > the
> > > > > > > > > SubDagOperator or simply remove it.
> > > > > > > > >
> > > > > > > > > Any thoughts?
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Bin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > +1, proposal looks good.
> > > > > > > > > >
> > > > > > > > > > The original intention was really to have tasks groups
> and
> > a
> > > > > > > > zoom-in/out
> > > > > > > > > in
> > > > > > > > > > the UI. The original reasoning was to reuse the DAG
> object
> > > > since
> > > > > it
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > group of tasks, but as highlighted here it does create
> > > > underlying
> > > > > > > > > > confusions since a DAG is much more than just a group of
> > > tasks.
> > > > > > > > > >
> > > > > > > > > > Max
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > > > joshipoornima06@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thank you for your email.
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > > bin.huangxb@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This
> > rewrites
> > > > the
> > > > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > > method to unpack subdag while parsing, and it
> will
> > > > give a
> > > > > > > > flat
> > > > > > > > > > > > > > structure at
> > > > > > > > > > > > > > the task level
> > > > > > > > > > > > >
> > > > > > > > > > > > > The serialized_dag representation already does
> this I
> > > > > think.
> > > > > > At
> > > > > > > > > least
> > > > > > > > > > > if
> > > > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure about serialized_dag representation,
> but
> > at
> > > > > least
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > still keep the subdag entry in the DAG table? In my
> > > > proposal
> > > > > as
> > > > > > > > also
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > draft PR, the idea is to *extract the tasks from the
> > > subdag
> > > > > and
> > > > > > > add
> > > > > > > > > > them
> > > > > > > > > > > > back to the root_dag. *So the runtime DAG graph will
> > look
> > > > > > exactly
> > > > > > > > the
> > > > > > > > > > > > same as without subdag but with metadata attached to
> > > those
> > > > > > > > sections.
> > > > > > > > > > > These
> > > > > > > > > > > > metadata will be later on used to render in the UI.
> So
> > > > after
> > > > > > > > parsing
> > > > > > > > > (
> > > > > > > > > > > > *DagBag.process_file()*), it will just output the
> > > *root_dag
> > > > > > > > *instead
> > > > > > > > > of
> > > > > > > > > > > *root_dag +
> > > > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > > > >
> > > > > > > > > > > > - e.g. section-1-* will have metadata
> > > > > > current_group=section-1,
> > > > > > > > > > > > parent_group=<the-root-dag-id> (welcome for naming
> > > > > > > suggestions),
> > > > > > > > > the
> > > > > > > > > > > > reason for parent_group is that we can have nested
> > group
> > > > and
> > > > > > > > still
> > > > > > > > > > be
> > > > > > > > > > > > able to capture the dependency.
> > > > > > > > > > > >
> > > > > > > > > > > > Runtime DAG:
> > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > >
> > > > > > > > > > > > While at the UI, what we see would be something like
> > this
> > > > by
> > > > > > > > > utilizing
> > > > > > > > > > > the
> > > > > > > > > > > > metadata, and then we can expand or zoom into in some
> > > way.
> > > > > > > > > > > > [image: image.png]
> > > > > > > > > > > >
> > > > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > > > 1. We don't need to deal with the extra complexity of
> > > > SubDag
> > > > > > for
> > > > > > > > > > > execution
> > > > > > > > > > > > and scheduling. It will be the same as not using
> > SubDag.
> > > > > > > > > > > > 2. Still have the benefits of modularized and
> reusable
> > > dag
> > > > > code
> > > > > > > and
> > > > > > > > > > > > declare dependencies between them. And with the new
> > > > > > > SubDagOperator
> > > > > > > > > (see
> > > > > > > > > > > AIP
> > > > > > > > > > > > or draft PR), we can use the same dag_factory
> function
> > > for
> > > > > > > > > generating 1
> > > > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag (in
> this
> > > > case,
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > just
> > > > > > > > > > > > extract all underlying tasks and append to the root
> > dag).
> > > > > > > > > > > >
> > > > > > > > > > > > - Then it gets to the idea of replacing subdag with a
> > > > > > simpler
> > > > > > > > > > concept
> > > > > > > > > > > > by Ash: the proposed change basically drains out the
> > > > > > contents
> > > > > > > > of
> > > > > > > > > a
> > > > > > > > > > > SubDag
> > > > > > > > > > > > and becomes more like
> > > > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > > > (forgive
> > > > > > > > > > > > me about the crazy name..). In this case, it is still
> > > > > > > necessary
> > > > > > > > to
> > > > > > > > > > > keep the
> > > > > > > > > > > > concept of subdag as it is nothing more than a name?
> > > > > > > > > > > >
> > > > > > > > > > > > That's why the TaskGroup idea comes up. Thanks Chris
> > > Palmer
> > > > > for
> > > > > > > > > helping
> > > > > > > > > > > > conceptualize the functionality of TaskGroup, I will
> > just
> > > > > paste
> > > > > > > it
> > > > > > > > > > here.
> > > > > > > > > > > >
> > > > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > > > - You *can* have dependencies between Tasks in the
> > same
> > > > > > > > TaskGroup,
> > > > > > > > > > but
> > > > > > > > > > > > > *cannot* have dependencies between a Task in a
> > > TaskGroup
> > > > > > and
> > > > > > > > > > either a
> > > > > > > > > > > > > Task in a different TaskGroup or a Task not in any
> > > group
> > > > > > > > > > > > > - You *can* have dependencies between a TaskGroup
> and
> > > > > > either
> > > > > > > > > other
> > > > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > > > - The UI will by default render a TaskGroup as a
> > single
> > > > > > > > "object",
> > > > > > > > > > but
> > > > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > > > - You'd need some way to determine what the
> "status"
> > > of a
> > > > > > > > > TaskGroup
> > > > > > > > > > > was
> > > > > > > > > > > > > at least for UI display purposes
> > > > > > > > > > > >
> > > > > > > > > > > > I agree with Chris:
> > > > > > > > > > > > - From the backend's view (scheduler & executor), I
> > think
> > > > > > > TaskGroup
> > > > > > > > > > > should
> > > > > > > > > > > > be ignored during execution. (unless we decide to
> > > implement
> > > > > > some
> > > > > > > > > > metadata
> > > > > > > > > > > > operations that allows start/stop a group of tasks
> > etc.)
> > > > > > > > > > > > - From the UI's View, it should be able to pick up
> the
> > > > > > individual
> > > > > > > > > > tasks'
> > > > > > > > > > > > status and then determine the TaskGroup's status
> > > > > > > > > > > >
> > > > > > > > > > > > Bin
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman <
> > > > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> I hadn’t thought about using the `>>` operator to
> tie
> > > dags
> > > > > > > > together
> > > > > > > > > > but
> > > > > > > > > > > I
> > > > > > > > > > > >> think that sounds pretty great! I wonder if we could
> > > > > > essentially
> > > > > > > > > write
> > > > > > > > > > > in
> > > > > > > > > > > >> the ability to set dependencies to all starter-tasks
> > for
> > > > > that
> > > > > > > DAG.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I’m personally ok with SubDag being a mostly UI
> > concept.
> > > > It
> > > > > > > > doesn’t
> > > > > > > > > > need
> > > > > > > > > > > >> to execute separately, you’re just adding more tasks
> > to
> > > > the
> > > > > > > queue
> > > > > > > > > that
> > > > > > > > > > > will
> > > > > > > > > > > >> be executed when there are resources available.
> > > > > > > > > > > >>
> > > > > > > > > > > >> via Newton Mail [
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > > > >> ]
> > > > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <
> > > > > > > chris@crpalmer.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >> I agree that SubDAGs are an overly complex
> > abstraction.
> > > I
> > > > > > think
> > > > > > > > what
> > > > > > > > > > is
> > > > > > > > > > > >> needed/useful is a TaskGroup concept. On a high
> level
> > I
> > > > > think
> > > > > > > you
> > > > > > > > > want
> > > > > > > > > > > >> this
> > > > > > > > > > > >> functionality:
> > > > > > > > > > > >>
> > > > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > > > >> - You *can* have dependencies between Tasks in the
> > same
> > > > > > > TaskGroup,
> > > > > > > > > but
> > > > > > > > > > > >> *cannot* have dependencies between a Task in a
> > TaskGroup
> > > > and
> > > > > > > > either
> > > > > > > > > a
> > > > > > > > > > > >> Task in a different TaskGroup or a Task not in any
> > group
> > > > > > > > > > > >> - You *can* have dependencies between a TaskGroup
> and
> > > > either
> > > > > > > other
> > > > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > > > >> - The UI will by default render a TaskGroup as a
> > single
> > > > > > > "object",
> > > > > > > > > but
> > > > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > > > >> - You'd need some way to determine what the "status"
> > of
> > > a
> > > > > > > > TaskGroup
> > > > > > > > > > was
> > > > > > > > > > > >> at least for UI display purposes
> > > > > > > > > > > >>
> > > > > > > > > > > >> Not sure if it would need to be a top level object
> > with
> > > > its
> > > > > > own
> > > > > > > > > > database
> > > > > > > > > > > >> table and model or just another attribute on tasks.
> I
> > > > think
> > > > > > you
> > > > > > > > > could
> > > > > > > > > > > >> build
> > > > > > > > > > > >> it in a way such that from the schedulers point of
> > view
> > > a
> > > > > DAG
> > > > > > > with
> > > > > > > > > > > >> TaskGroups doesn't get treated any differently. So
> it
> > > > really
> > > > > > > just
> > > > > > > > > > > becomes
> > > > > > > > > > > >> a
> > > > > > > > > > > >> shortcut for setting dependencies between sets of
> > Tasks,
> > > > and
> > > > > > > > allows
> > > > > > > > > > the
> > > > > > > > > > > UI
> > > > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Chris
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > > > >> >
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Agree with James (and think it's actually the more
> > > > > important
> > > > > > > > issue
> > > > > > > > > > to
> > > > > > > > > > > >> fix),
> > > > > > > > > > > >> > but I am still convinced Ash' idea is the right
> way
> > > > > forward
> > > > > > > > (just
> > > > > > > > > it
> > > > > > > > > > > >> might
> > > > > > > > > > > >> > require a bit more work to deprecate than adding
> > > visual
> > > > > > > grouping
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > >> > UI).
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > There was a previous thread about this FYI with
> more
> > > > > context
> > > > > > > on
> > > > > > > > > why
> > > > > > > > > > > >> subdags
> > > > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > > > >> >
> > > > > > > >
> > > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > > > . A
> > > > > > > > > > > >> > solution I outline there to Jame's problem is e.g.
> > > > > enabling
> > > > > > > the
> > > > > > > > >>
> > > > > > > > > > > >> operator
> > > > > > > > > > > >> > for Airflow operators to work with DAGs as well. I
> > see
> > > > > this
> > > > > > > > being
> > > > > > > > > > > >> separate
> > > > > > > > > > > >> > from Ash' solution for DAG grouping in the UI but
> > one
> > > of
> > > > > the
> > > > > > > two
> > > > > > > > > > items
> > > > > > > > > > > >> > required to replace all existing subdag
> > functionality.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > I've been working with subdags for 3 years and
> they
> > > are
> > > > > > > always a
> > > > > > > > > > giant
> > > > > > > > > > > >> pain
> > > > > > > > > > > >> > to use. They are a constant source of user
> confusion
> > > and
> > > > > > > > breakages
> > > > > > > > > > > >> during
> > > > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <
> > > > > > > > jcoder01@gmail.com>
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > I'm not sure I totally agree it's just a UI
> > > concept. I
> > > > > use
> > > > > > > the
> > > > > > > > > > > subdag
> > > > > > > > > > > >> > > operator to simplify dependencies too. If you
> > have a
> > > > > group
> > > > > > > of
> > > > > > > > > > tasks
> > > > > > > > > > > >> that
> > > > > > > > > > > >> > > need to finish before another group of tasks
> > start,
> > > > > using
> > > > > > a
> > > > > > > > > subdag
> > > > > > > > > > > is
> > > > > > > > > > > >> a
> > > > > > > > > > > >> > > pretty quick way to set those dependencies and I
> > > think
> > > > > > also
> > > > > > > > make
> > > > > > > > > > it
> > > > > > > > > > > >> > easier
> > > > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <
> > > > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash
> > Berlin-Taylor
> > > <
> > > > > > > > > > ash@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > > Question:
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Do we even need the SubDagOperator anymore?
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Would removing it entirely and just
> replacing
> > it
> > > > > with
> > > > > > a
> > > > > > > UI
> > > > > > > > > > > >> grouping
> > > > > > > > > > > >> > > > > concept be conceptually simpler, less to get
> > > > wrong,
> > > > > > and
> > > > > > > > > closer
> > > > > > > > > > > to
> > > > > > > > > > > >> > what
> > > > > > > > > > > >> > > > > users actually want to achieve with subdags?
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > With your proposed change, tasks in subdags
> > > could
> > > > > > start
> > > > > > > > > > running
> > > > > > > > > > > in
> > > > > > > > > > > >> > > > > parallel (a good change) -- so should we not
> > > also
> > > > > just
> > > > > > > > > > > _enitrely_
> > > > > > > > > > > >> > > remove
> > > > > > > > > > > >> > > > > the concept of a sub dag and replace it with
> > > > > something
> > > > > > > > > > simpler.
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Problems with subdags (I think. I haven't
> used
> > > > them
> > > > > > > > > > extensively
> > > > > > > > > > > so
> > > > > > > > > > > >> > may
> > > > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > > > >> > > > > - They need their own dag_id, but it has(?)
> to
> > > be
> > > > of
> > > > > > the
> > > > > > > > > form
> > > > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > > > >> > > > > - They need their own schedule_interval, but
> > it
> > > > has
> > > > > to
> > > > > > > > match
> > > > > > > > > > the
> > > > > > > > > > > >> > parent
> > > > > > > > > > > >> > > > dag
> > > > > > > > > > > >> > > > > - Sub dags can be paused on their own. (Does
> > it
> > > > make
> > > > > > > sense
> > > > > > > > > to
> > > > > > > > > > do
> > > > > > > > > > > >> > this?
> > > > > > > > > > > >> > > > > Pausing just a sub dag would mean the sub
> dag
> > > > would
> > > > > > > never
> > > > > > > > > > > >> execute, so
> > > > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > > > >> > > > > - You had to choose the executor to
> operator a
> > > > > subdag
> > > > > > > with
> > > > > > > > > --
> > > > > > > > > > > >> always
> > > > > > > > > > > >> > a
> > > > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > -ash
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash
> > Berlin-Taylor <
> > > > > > > > > > ash@apache.org>
> > > > > > > > > > > >> > wrote:
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm
> excited
> > to
> > > > see
> > > > > > how
> > > > > > > > > this
> > > > > > > > > > > >> > > progresses.
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*:
> This
> > > > > > rewrites
> > > > > > > > the
> > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > >> > > > > >> method to unpack subdag while parsing,
> and
> > it
> > > > > will
> > > > > > > > give a
> > > > > > > > > > > flat
> > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > > The serialized_dag representation already
> > does
> > > > > this
> > > > > > I
> > > > > > > > > think.
> > > > > > > > > > > At
> > > > > > > > > > > >> > least
> > > > > > > > > > > >> > > > if
> > > > > > > > > > > >> > > > > > I've understood your idea here correctly.
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > > -ash
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <
> > > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > > wrote:
> > > > > > > > > > > >> > > > > >
> > > > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Sending a message to everyone and collect
> > > > > feedback
> > > > > > on
> > > > > > > > the
> > > > > > > > > > > >> AIP-34
> > > > > > > > > > > >> > on
> > > > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was
> > previously
> > > > > > briefly
> > > > > > > > > > > >> mentioned in
> > > > > > > > > > > >> > > the
> > > > > > > > > > > >> > > > > >> discussion about what needs to be done
> for
> > > > > Airflow
> > > > > > > 2.0,
> > > > > > > > > and
> > > > > > > > > > > >> one of
> > > > > > > > > > > >> > > the
> > > > > > > > > > > >> > > > > >> ideas is to make SubDagOperator attach
> > tasks
> > > > back
> > > > > > to
> > > > > > > > the
> > > > > > > > > > root
> > > > > > > > > > > >> DAG.
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> This AIP-34 focuses on solving
> > SubDagOperator
> > > > > > related
> > > > > > > > > > issues
> > > > > > > > > > > by
> > > > > > > > > > > >> > > > > reattaching
> > > > > > > > > > > >> > > > > >> all tasks back to the root dag while
> > > respecting
> > > > > > > > > > dependencies
> > > > > > > > > > > >> > during
> > > > > > > > > > > >> > > > > >> parsing. The original grouping effect on
> > the
> > > UI
> > > > > > will
> > > > > > > be
> > > > > > > > > > > >> achieved
> > > > > > > > > > > >> > > > through
> > > > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> This also makes the dag_factory function
> > more
> > > > > > > reusable
> > > > > > > > > > > because
> > > > > > > > > > > >> you
> > > > > > > > > > > >> > > > don't
> > > > > > > > > > > >> > > > > >> need to have parent_dag_name and
> > > child_dag_name
> > > > > in
> > > > > > > the
> > > > > > > > > > > function
> > > > > > > > > > > >> > > > > signature
> > > > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*:
> This
> > > > > > rewrites
> > > > > > > > the
> > > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > > >> > > > > >> method to unpack subdag while parsing,
> and
> > it
> > > > > will
> > > > > > > > give a
> > > > > > > > > > > flat
> > > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The new
> > > > > SubDagOperator
> > > > > > > > acts
> > > > > > > > > > > like a
> > > > > > > > > > > >> > > > > >> container and most of the original
> methods
> > > are
> > > > > > > removed.
> > > > > > > > > The
> > > > > > > > > > > >> > > > > >> signature is
> > > > > > > > > > > >> > > > > >> also changed to *subdag_factory *with
> > > > > *subdag_args
> > > > > > > *and
> > > > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > > > >> > > > > >> This is similar to the PythonOperator
> > > > signature.
> > > > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> > > current_group
> > > > &
> > > > > > > > > > parent_group
> > > > > > > > > > > >> > > > > attributes
> > > > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is used
> to
> > > > group
> > > > > > > tasks
> > > > > > > > > for
> > > > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > > > >> > > > > >> UI level. It may potentially extend
> further
> > > to
> > > > > > group
> > > > > > > > > > > arbitrary
> > > > > > > > > > > >> > > tasks
> > > > > > > > > > > >> > > > > >> outside the context of subdag to allow
> > > > > group-level
> > > > > > > > > > operations
> > > > > > > > > > > >> > > (i.e.
> > > > > > > > > > > >> > > > > >> stop/trigger a group of task within the
> > dag)
> > > > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*: Proposed UI
> > > > > > modification
> > > > > > > > to
> > > > > > > > > > > allow
> > > > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a flat
> > > > > structure
> > > > > > to
> > > > > > > > > pair
> > > > > > > > > > > with
> > > > > > > > > > > >> > > the
> > > > > > > > > > > >> > > > > first
> > > > > > > > > > > >> > > > > >> change instead of the original
> hierarchical
> > > > > > > structure.
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Please see related documents and PRs for
> > > > details:
> > > > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > > > >> > > > > >> Draft PR:
> > > > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Please let me know if there are any
> aspects
> > > > that
> > > > > > you
> > > > > > > > > > > >> > agree/disagree
> > > > > > > > > > > >> > > > > >> with or
> > > > > > > > > > > >> > > > > >> need more clarification (especially the
> > third
> > > > > > change
> > > > > > > > > > > regarding
> > > > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > > > >> > > > > >> Any comments are welcome and I am looking
> > > > forward
> > > > > > to
> > > > > > > > it!
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > > > >> > > > > >> Bin
> > > > > > > > > > > >> > > > > >>
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > --
> > > > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Thanks & Regards
> > > > > > > > > > > Poornima
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> <+48%20660%20796%20129>>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> <+48%20660%20796%20129>>
> [image: Polidea] <https://www.polidea.com/>
>


-- 

*Jacob Ferriero*

Strategic Cloud Engineer: Data Engineering

jferriero@google.com

617-714-2509

Re: [AIP-34] Rewrite SubDagOperator

Posted by Jarek Potiuk <Ja...@polidea.com>.
All for it :) . I think we are getting closer to have regular planning and
making some structured approach to 2.0 and starting task force for it soon,
so I think this should be perfectly fine to discuss and even start
implementing what's beyond as soon as we make sure that we are prioritizing
2.0 work.

J,


On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <yu...@gmail.com> wrote:

> Hi Jarek,
>
> I agree we should not change the behaviour of the existing SubDagOperator
> till Airflow 2.1. Is it okay to continue the discussion about TaskGroup as
> a brand new concept/feature independent from the existing SubDagOperator?
> In other words, shall we add TaskGroup as a UI grouping concept like Ash
> suggested, and not touch SubDagOperator atl all. Whenever we are ready with
> TaskGroup, we then deprecate SubDagOperator in Airflow 2.1.
>
> I really like Ash's idea of simplifying the SubDagOperator idea into a
> simple UI grouping concept. I think Xinbin's idea of "reattaching all the
> tasks to the root DAG" is the way to go. And I see James pointed out we
> need some helper functions to simplify dependencies setting of TaskGroup.
> Xinbin put up a pretty elegant example in his PR
> <https://github.com/apache/airflow/pull/9243>. I think having TaskGroup as
> a UI concept should be a relatively small change. We can simplify Xinbin's
> PR further. So I put up this alternative proposal here:
> https://github.com/apache/airflow/pull/10153
>
> I have not done any UI changes due to lack of experience with web UI. If
> anyone's interested, please take a look at the PR.
>
> Qian
>
> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Similar point here to the other ideas that are popping up. Maybe we
> should
> > just focus on completing 2.0 and make all discussions about further
> > improvements to 2.1? While those are important discussions (and we should
> > continue them in the  near future !) I think at this point focusing on
> > delivering 2.0 in its current shape should be our focus now ?
> >
> > J.
> >
> > On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <bi...@gmail.com>
> > wrote:
> >
> > > Hi Daniel
> > >
> > > I agree that the TaskGroup should have the same API as a DAG object
> > related
> > > to task dependencies, but it will not have anything related to actual
> > > execution or scheduling.
> > > I will update the AIP according to this over the weekend.
> > >
> > > > We could even make a “DAGTemplate” object s.t. when you import the
> > object
> > > you can import it with parameters to determine the shape of the DAG.
> > >
> > > Can you elaborate a bit more on this? Does it serve a similar purpose
> as
> > a
> > > DAG factory function?
> > >
> > >
> > >
> > > On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > daniel.imberman@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Bin,
> > > >
> > > > Why not give the TaskGroup the same API as a DAG object (e.g. the
> > bitwise
> > > > operator fro task dependencies). We could even make a “DAGTemplate”
> > > object
> > > > s.t. when you import the object you can import it with parameters to
> > > > determine the shape of the DAG.
> > > >
> > > >
> > > > On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <bin.huangxb@gmail.com
> >
> > > > wrote:
> > > > The TaskGroup will not take schedule interval as a parameter itself,
> > and
> > > it
> > > > depends on the DAG where it attaches to. In my opinion, the TaskGroup
> > > will
> > > > only contain a group of tasks with interdependencies, and the
> TaskGroup
> > > > behaves like a task. It doesn't contain any execution/scheduling
> logic
> > > > (i.e. schedule_interval, concurrency, max_active_runs etc.) like a
> DAG
> > > > does.
> > > >
> > > > > For example, there is the scenario that the schedule interval of
> DAG
> > is
> > > > 1 hour and the schedule interval of TaskGroup is 20 min.
> > > >
> > > > I am curious why you ask this. Is this a use case that you want to
> > > achieve?
> > > >
> > > > Bin
> > > >
> > > > On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <th...@gmail.com>
> wrote:
> > > >
> > > > > Hi Bin,
> > > > > Using TaskGroup, Is the schedule interval of TaskGroup the same as
> > the
> > > > > parent DAG? My main concern is whether the schedule interval of
> > > TaskGroup
> > > > > could be different with that of the DAG? For example, there is the
> > > > scenario
> > > > > that the schedule interval of DAG is 1 hour and the schedule
> interval
> > > of
> > > > > TaskGroup is 20 min.
> > > > >
> > > > > Cheers,
> > > > > Nicholas
> > > > >
> > > > > On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> bin.huangxb@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Nicholas,
> > > > > >
> > > > > > I am not sure about the old behavior of SubDagOperator, maybe it
> > will
> > > > > throw
> > > > > > an error? But in the original proposal, the subdag's
> > > schedule_interval
> > > > > will
> > > > > > be ignored. Or if we decide to use TaskGroup to replace SubDag,
> > there
> > > > > will
> > > > > > be no subdag schedule_interval.
> > > > > >
> > > > > > Bin
> > > > > >
> > > > > > On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <th...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Bin,
> > > > > > > Thanks for your good proposal. I was confused whether the
> > schedule
> > > > > > > interval of SubDAG is different from that of the parent DAG? I
> > have
> > > > > > > discussed with Jiajie Zhong about the schedule interval of
> > SubDAG.
> > > If
> > > > > the
> > > > > > > SubDagOperator has a different schedule interval, what will
> > happen
> > > > for
> > > > > > the
> > > > > > > scheduler to schedule the parent DAG?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Nicholas Jiang
> > > > > > >
> > > > > > > On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > bin.huangxb@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you, Max, Kaxil, and everyone's feedback!
> > > > > > > >
> > > > > > > > I have rethought about the concept of subdag and task
> groups. I
> > > > think
> > > > > > the
> > > > > > > > better way to approach this is to entirely remove subdag and
> > > > > introduce
> > > > > > > the
> > > > > > > > concept of TaskGroup, which is a container of tasks along
> with
> > > > their
> > > > > > > > dependencies *without execution/scheduling logic as a DAG*.
> The
> > > > only
> > > > > > > > purpose of it is to group a list of tasks, but you still need
> > to
> > > > add
> > > > > it
> > > > > > > to
> > > > > > > > a DAG for execution.
> > > > > > > >
> > > > > > > > Here is a small code snippet.
> > > > > > > >
> > > > > > > > ```
> > > > > > > > class TaskGroup:
> > > > > > > > """
> > > > > > > > A TaskGroup contains a group of tasks.
> > > > > > > >
> > > > > > > > If default_args is missing, it will take default args from
> the
> > > > > DAG.
> > > > > > > > """
> > > > > > > > def __init__(self, group_id, default_args):
> > > > > > > > pass
> > > > > > > >
> > > > > > > >
> > > > > > > > """
> > > > > > > > You can add tasks to a task group similar to adding tasks to
> a
> > > DAG
> > > > > > > >
> > > > > > > > This can be declared in a separate file from the dag file
> > > > > > > > """
> > > > > > > > download_group = TaskGroup(group_id='download',
> > > > > > > default_args=default_args)
> > > > > > > > download_group.add_task(task1)
> > > > > > > > task2.dag = download_group
> > > > > > > >
> > > > > > > > with download_group:
> > > > > > > > task3 = DummyOperator(task_id='task3')
> > > > > > > >
> > > > > > > > [task, task2] >> task3
> > > > > > > >
> > > > > > > >
> > > > > > > > """Add it to a DAG for execution"""
> > > > > > > > with DAG(dag_id='start_download_dag',
> > default_args=default_args,
> > > > > > > > schedule_interval='@daily', ...) as dag:
> > > > > > > > start = DummyOperator(task_id='start')
> > > > > > > > start >> download_group
> > > > > > > > # this is equivalent to
> > > > > > > > # start >> [task, task2] >> task3
> > > > > > > > ```
> > > > > > > >
> > > > > > > > With this, we can still reuse a group of tasks and set
> > > dependencies
> > > > > > > between
> > > > > > > > them; it avoids the boilerplate code from using
> SubDagOperator,
> > > and
> > > > > we
> > > > > > > can
> > > > > > > > declare dependencies as `task >> task_group >> task`.
> > > > > > > >
> > > > > > > > User migration wise, we can introduce it before Airflow 2.0
> and
> > > > allow
> > > > > > > > gradual transition. Then we can decide if we still want to
> keep
> > > the
> > > > > > > > SubDagOperator or simply remove it.
> > > > > > > >
> > > > > > > > Any thoughts?
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Bin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jun 17, 2020 at 7:37 AM Maxime Beauchemin <
> > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > +1, proposal looks good.
> > > > > > > > >
> > > > > > > > > The original intention was really to have tasks groups and
> a
> > > > > > > zoom-in/out
> > > > > > > > in
> > > > > > > > > the UI. The original reasoning was to reuse the DAG object
> > > since
> > > > it
> > > > > > is
> > > > > > > a
> > > > > > > > > group of tasks, but as highlighted here it does create
> > > underlying
> > > > > > > > > confusions since a DAG is much more than just a group of
> > tasks.
> > > > > > > > >
> > > > > > > > > Max
> > > > > > > > >
> > > > > > > > > On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > > > > > > joshipoornima06@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thank you for your email.
> > > > > > > > > >
> > > > > > > > > > On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > > > > bin.huangxb@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > > > - *Unpack SubDags during dag parsing*: This
> rewrites
> > > the
> > > > > > > > > > > *DagBag.bag_dag*
> > > > > > > > > > > > > method to unpack subdag while parsing, and it will
> > > give a
> > > > > > > flat
> > > > > > > > > > > > > structure at
> > > > > > > > > > > > > the task level
> > > > > > > > > > > >
> > > > > > > > > > > > The serialized_dag representation already does this I
> > > > think.
> > > > > At
> > > > > > > > least
> > > > > > > > > > if
> > > > > > > > > > > > I've understood your idea here correctly.
> > > > > > > > > > >
> > > > > > > > > > > I am not sure about serialized_dag representation, but
> at
> > > > least
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > still keep the subdag entry in the DAG table? In my
> > > proposal
> > > > as
> > > > > > > also
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > draft PR, the idea is to *extract the tasks from the
> > subdag
> > > > and
> > > > > > add
> > > > > > > > > them
> > > > > > > > > > > back to the root_dag. *So the runtime DAG graph will
> look
> > > > > exactly
> > > > > > > the
> > > > > > > > > > > same as without subdag but with metadata attached to
> > those
> > > > > > > sections.
> > > > > > > > > > These
> > > > > > > > > > > metadata will be later on used to render in the UI. So
> > > after
> > > > > > > parsing
> > > > > > > > (
> > > > > > > > > > > *DagBag.process_file()*), it will just output the
> > *root_dag
> > > > > > > *instead
> > > > > > > > of
> > > > > > > > > > *root_dag +
> > > > > > > > > > > subdag + subdag + nested subdag* etc.
> > > > > > > > > > >
> > > > > > > > > > > - e.g. section-1-* will have metadata
> > > > > current_group=section-1,
> > > > > > > > > > > parent_group=<the-root-dag-id> (welcome for naming
> > > > > > suggestions),
> > > > > > > > the
> > > > > > > > > > > reason for parent_group is that we can have nested
> group
> > > and
> > > > > > > still
> > > > > > > > > be
> > > > > > > > > > > able to capture the dependency.
> > > > > > > > > > >
> > > > > > > > > > > Runtime DAG:
> > > > > > > > > > > [image: image.png]
> > > > > > > > > > >
> > > > > > > > > > > While at the UI, what we see would be something like
> this
> > > by
> > > > > > > > utilizing
> > > > > > > > > > the
> > > > > > > > > > > metadata, and then we can expand or zoom into in some
> > way.
> > > > > > > > > > > [image: image.png]
> > > > > > > > > > >
> > > > > > > > > > > The benefits I can see is that:
> > > > > > > > > > > 1. We don't need to deal with the extra complexity of
> > > SubDag
> > > > > for
> > > > > > > > > > execution
> > > > > > > > > > > and scheduling. It will be the same as not using
> SubDag.
> > > > > > > > > > > 2. Still have the benefits of modularized and reusable
> > dag
> > > > code
> > > > > > and
> > > > > > > > > > > declare dependencies between them. And with the new
> > > > > > SubDagOperator
> > > > > > > > (see
> > > > > > > > > > AIP
> > > > > > > > > > > or draft PR), we can use the same dag_factory function
> > for
> > > > > > > > generating 1
> > > > > > > > > > > dag, a lot of dynamic dags, or used for SubDag (in this
> > > case,
> > > > > it
> > > > > > > will
> > > > > > > > > > just
> > > > > > > > > > > extract all underlying tasks and append to the root
> dag).
> > > > > > > > > > >
> > > > > > > > > > > - Then it gets to the idea of replacing subdag with a
> > > > > simpler
> > > > > > > > > concept
> > > > > > > > > > > by Ash: the proposed change basically drains out the
> > > > > contents
> > > > > > > of
> > > > > > > > a
> > > > > > > > > > SubDag
> > > > > > > > > > > and becomes more like
> > > > > > > ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > > > > > > (forgive
> > > > > > > > > > > me about the crazy name..). In this case, it is still
> > > > > > necessary
> > > > > > > to
> > > > > > > > > > keep the
> > > > > > > > > > > concept of subdag as it is nothing more than a name?
> > > > > > > > > > >
> > > > > > > > > > > That's why the TaskGroup idea comes up. Thanks Chris
> > Palmer
> > > > for
> > > > > > > > helping
> > > > > > > > > > > conceptualize the functionality of TaskGroup, I will
> just
> > > > paste
> > > > > > it
> > > > > > > > > here.
> > > > > > > > > > >
> > > > > > > > > > > > - Tasks can be added to a TaskGroup
> > > > > > > > > > > > - You *can* have dependencies between Tasks in the
> same
> > > > > > > TaskGroup,
> > > > > > > > > but
> > > > > > > > > > > > *cannot* have dependencies between a Task in a
> > TaskGroup
> > > > > and
> > > > > > > > > either a
> > > > > > > > > > > > Task in a different TaskGroup or a Task not in any
> > group
> > > > > > > > > > > > - You *can* have dependencies between a TaskGroup and
> > > > > either
> > > > > > > > other
> > > > > > > > > > > > TaskGroups or Tasks not in any group
> > > > > > > > > > > > - The UI will by default render a TaskGroup as a
> single
> > > > > > > "object",
> > > > > > > > > but
> > > > > > > > > > > > which you expand or zoom into in some way
> > > > > > > > > > > > - You'd need some way to determine what the "status"
> > of a
> > > > > > > > TaskGroup
> > > > > > > > > > was
> > > > > > > > > > > > at least for UI display purposes
> > > > > > > > > > >
> > > > > > > > > > > I agree with Chris:
> > > > > > > > > > > - From the backend's view (scheduler & executor), I
> think
> > > > > > TaskGroup
> > > > > > > > > > should
> > > > > > > > > > > be ignored during execution. (unless we decide to
> > implement
> > > > > some
> > > > > > > > > metadata
> > > > > > > > > > > operations that allows start/stop a group of tasks
> etc.)
> > > > > > > > > > > - From the UI's View, it should be able to pick up the
> > > > > individual
> > > > > > > > > tasks'
> > > > > > > > > > > status and then determine the TaskGroup's status
> > > > > > > > > > >
> > > > > > > > > > > Bin
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 12, 2020 at 10:28 AM Daniel Imberman <
> > > > > > > > > > > daniel.imberman@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >> I hadn’t thought about using the `>>` operator to tie
> > dags
> > > > > > > together
> > > > > > > > > but
> > > > > > > > > > I
> > > > > > > > > > >> think that sounds pretty great! I wonder if we could
> > > > > essentially
> > > > > > > > write
> > > > > > > > > > in
> > > > > > > > > > >> the ability to set dependencies to all starter-tasks
> for
> > > > that
> > > > > > DAG.
> > > > > > > > > > >>
> > > > > > > > > > >> I’m personally ok with SubDag being a mostly UI
> concept.
> > > It
> > > > > > > doesn’t
> > > > > > > > > need
> > > > > > > > > > >> to execute separately, you’re just adding more tasks
> to
> > > the
> > > > > > queue
> > > > > > > > that
> > > > > > > > > > will
> > > > > > > > > > >> be executed when there are resources available.
> > > > > > > > > > >>
> > > > > > > > > > >> via Newton Mail [
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > > > >> ]
> > > > > > > > > > >> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer <
> > > > > > chris@crpalmer.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >> I agree that SubDAGs are an overly complex
> abstraction.
> > I
> > > > > think
> > > > > > > what
> > > > > > > > > is
> > > > > > > > > > >> needed/useful is a TaskGroup concept. On a high level
> I
> > > > think
> > > > > > you
> > > > > > > > want
> > > > > > > > > > >> this
> > > > > > > > > > >> functionality:
> > > > > > > > > > >>
> > > > > > > > > > >> - Tasks can be added to a TaskGroup
> > > > > > > > > > >> - You *can* have dependencies between Tasks in the
> same
> > > > > > TaskGroup,
> > > > > > > > but
> > > > > > > > > > >> *cannot* have dependencies between a Task in a
> TaskGroup
> > > and
> > > > > > > either
> > > > > > > > a
> > > > > > > > > > >> Task in a different TaskGroup or a Task not in any
> group
> > > > > > > > > > >> - You *can* have dependencies between a TaskGroup and
> > > either
> > > > > > other
> > > > > > > > > > >> TaskGroups or Tasks not in any group
> > > > > > > > > > >> - The UI will by default render a TaskGroup as a
> single
> > > > > > "object",
> > > > > > > > but
> > > > > > > > > > >> which you expand or zoom into in some way
> > > > > > > > > > >> - You'd need some way to determine what the "status"
> of
> > a
> > > > > > > TaskGroup
> > > > > > > > > was
> > > > > > > > > > >> at least for UI display purposes
> > > > > > > > > > >>
> > > > > > > > > > >> Not sure if it would need to be a top level object
> with
> > > its
> > > > > own
> > > > > > > > > database
> > > > > > > > > > >> table and model or just another attribute on tasks. I
> > > think
> > > > > you
> > > > > > > > could
> > > > > > > > > > >> build
> > > > > > > > > > >> it in a way such that from the schedulers point of
> view
> > a
> > > > DAG
> > > > > > with
> > > > > > > > > > >> TaskGroups doesn't get treated any differently. So it
> > > really
> > > > > > just
> > > > > > > > > > becomes
> > > > > > > > > > >> a
> > > > > > > > > > >> shortcut for setting dependencies between sets of
> Tasks,
> > > and
> > > > > > > allows
> > > > > > > > > the
> > > > > > > > > > UI
> > > > > > > > > > >> to simplify the render of the DAG structure.
> > > > > > > > > > >>
> > > > > > > > > > >> Chris
> > > > > > > > > > >>
> > > > > > > > > > >> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > > > > > > > > <ddavydov@twitter.com.invalid
> > > > > > > > > > >> >
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Agree with James (and think it's actually the more
> > > > important
> > > > > > > issue
> > > > > > > > > to
> > > > > > > > > > >> fix),
> > > > > > > > > > >> > but I am still convinced Ash' idea is the right way
> > > > forward
> > > > > > > (just
> > > > > > > > it
> > > > > > > > > > >> might
> > > > > > > > > > >> > require a bit more work to deprecate than adding
> > visual
> > > > > > grouping
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > >> > UI).
> > > > > > > > > > >> >
> > > > > > > > > > >> > There was a previous thread about this FYI with more
> > > > context
> > > > > > on
> > > > > > > > why
> > > > > > > > > > >> subdags
> > > > > > > > > > >> > are bad and potential solutions:
> > > > > > > > > > >> >
> > > > > > >
> > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > > > > > . A
> > > > > > > > > > >> > solution I outline there to Jame's problem is e.g.
> > > > enabling
> > > > > > the
> > > > > > > >>
> > > > > > > > > > >> operator
> > > > > > > > > > >> > for Airflow operators to work with DAGs as well. I
> see
> > > > this
> > > > > > > being
> > > > > > > > > > >> separate
> > > > > > > > > > >> > from Ash' solution for DAG grouping in the UI but
> one
> > of
> > > > the
> > > > > > two
> > > > > > > > > items
> > > > > > > > > > >> > required to replace all existing subdag
> functionality.
> > > > > > > > > > >> >
> > > > > > > > > > >> > I've been working with subdags for 3 years and they
> > are
> > > > > > always a
> > > > > > > > > giant
> > > > > > > > > > >> pain
> > > > > > > > > > >> > to use. They are a constant source of user confusion
> > and
> > > > > > > breakages
> > > > > > > > > > >> during
> > > > > > > > > > >> > upgrades. Would love to see them gone :).
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Fri, Jun 12, 2020 at 11:11 AM James Coder <
> > > > > > > jcoder01@gmail.com>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > I'm not sure I totally agree it's just a UI
> > concept. I
> > > > use
> > > > > > the
> > > > > > > > > > subdag
> > > > > > > > > > >> > > operator to simplify dependencies too. If you
> have a
> > > > group
> > > > > > of
> > > > > > > > > tasks
> > > > > > > > > > >> that
> > > > > > > > > > >> > > need to finish before another group of tasks
> start,
> > > > using
> > > > > a
> > > > > > > > subdag
> > > > > > > > > > is
> > > > > > > > > > >> a
> > > > > > > > > > >> > > pretty quick way to set those dependencies and I
> > think
> > > > > also
> > > > > > > make
> > > > > > > > > it
> > > > > > > > > > >> > easier
> > > > > > > > > > >> > > to follow the dag code.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Fri, Jun 12, 2020 at 9:53 AM Kyle Hamlin <
> > > > > > > > hamlin.kn@gmail.com>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > I second Ash’s grouping concept.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > On Fri, Jun 12, 2020 at 5:10 AM Ash
> Berlin-Taylor
> > <
> > > > > > > > > ash@apache.org
> > > > > > > > > > >
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > > Question:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Do we even need the SubDagOperator anymore?
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Would removing it entirely and just replacing
> it
> > > > with
> > > > > a
> > > > > > UI
> > > > > > > > > > >> grouping
> > > > > > > > > > >> > > > > concept be conceptually simpler, less to get
> > > wrong,
> > > > > and
> > > > > > > > closer
> > > > > > > > > > to
> > > > > > > > > > >> > what
> > > > > > > > > > >> > > > > users actually want to achieve with subdags?
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > With your proposed change, tasks in subdags
> > could
> > > > > start
> > > > > > > > > running
> > > > > > > > > > in
> > > > > > > > > > >> > > > > parallel (a good change) -- so should we not
> > also
> > > > just
> > > > > > > > > > _enitrely_
> > > > > > > > > > >> > > remove
> > > > > > > > > > >> > > > > the concept of a sub dag and replace it with
> > > > something
> > > > > > > > > simpler.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Problems with subdags (I think. I haven't used
> > > them
> > > > > > > > > extensively
> > > > > > > > > > so
> > > > > > > > > > >> > may
> > > > > > > > > > >> > > > > be wrong on some of these):
> > > > > > > > > > >> > > > > - They need their own dag_id, but it has(?) to
> > be
> > > of
> > > > > the
> > > > > > > > form
> > > > > > > > > > >> > > > > `parent_dag_id.subdag_id`.
> > > > > > > > > > >> > > > > - They need their own schedule_interval, but
> it
> > > has
> > > > to
> > > > > > > match
> > > > > > > > > the
> > > > > > > > > > >> > parent
> > > > > > > > > > >> > > > dag
> > > > > > > > > > >> > > > > - Sub dags can be paused on their own. (Does
> it
> > > make
> > > > > > sense
> > > > > > > > to
> > > > > > > > > do
> > > > > > > > > > >> > this?
> > > > > > > > > > >> > > > > Pausing just a sub dag would mean the sub dag
> > > would
> > > > > > never
> > > > > > > > > > >> execute, so
> > > > > > > > > > >> > > > > the SubDagOperator would fail too.
> > > > > > > > > > >> > > > > - You had to choose the executor to operator a
> > > > subdag
> > > > > > with
> > > > > > > > --
> > > > > > > > > > >> always
> > > > > > > > > > >> > a
> > > > > > > > > > >> > > > > bit of a kludge.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Thoughts?
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > -ash
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On Jun 12 2020, at 12:01 pm, Ash
> Berlin-Taylor <
> > > > > > > > > ash@apache.org>
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > > Workon sub-dags is much needed, I'm excited
> to
> > > see
> > > > > how
> > > > > > > > this
> > > > > > > > > > >> > > progresses.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*: This
> > > > > rewrites
> > > > > > > the
> > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > >> > > > > >> method to unpack subdag while parsing, and
> it
> > > > will
> > > > > > > give a
> > > > > > > > > > flat
> > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > The serialized_dag representation already
> does
> > > > this
> > > > > I
> > > > > > > > think.
> > > > > > > > > > At
> > > > > > > > > > >> > least
> > > > > > > > > > >> > > > if
> > > > > > > > > > >> > > > > > I've understood your idea here correctly.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > -ash
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > On Jun 12 2020, at 9:51 am, Xinbin Huang <
> > > > > > > > > > bin.huangxb@gmail.com
> > > > > > > > > > >> >
> > > > > > > > > > >> > > > wrote:
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > >> Hi everyone,
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Sending a message to everyone and collect
> > > > feedback
> > > > > on
> > > > > > > the
> > > > > > > > > > >> AIP-34
> > > > > > > > > > >> > on
> > > > > > > > > > >> > > > > >> rewriting SubDagOperator. This was
> previously
> > > > > briefly
> > > > > > > > > > >> mentioned in
> > > > > > > > > > >> > > the
> > > > > > > > > > >> > > > > >> discussion about what needs to be done for
> > > > Airflow
> > > > > > 2.0,
> > > > > > > > and
> > > > > > > > > > >> one of
> > > > > > > > > > >> > > the
> > > > > > > > > > >> > > > > >> ideas is to make SubDagOperator attach
> tasks
> > > back
> > > > > to
> > > > > > > the
> > > > > > > > > root
> > > > > > > > > > >> DAG.
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> This AIP-34 focuses on solving
> SubDagOperator
> > > > > related
> > > > > > > > > issues
> > > > > > > > > > by
> > > > > > > > > > >> > > > > reattaching
> > > > > > > > > > >> > > > > >> all tasks back to the root dag while
> > respecting
> > > > > > > > > dependencies
> > > > > > > > > > >> > during
> > > > > > > > > > >> > > > > >> parsing. The original grouping effect on
> the
> > UI
> > > > > will
> > > > > > be
> > > > > > > > > > >> achieved
> > > > > > > > > > >> > > > through
> > > > > > > > > > >> > > > > >> grouping related tasks by metadata.
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> This also makes the dag_factory function
> more
> > > > > > reusable
> > > > > > > > > > because
> > > > > > > > > > >> you
> > > > > > > > > > >> > > > don't
> > > > > > > > > > >> > > > > >> need to have parent_dag_name and
> > child_dag_name
> > > > in
> > > > > > the
> > > > > > > > > > function
> > > > > > > > > > >> > > > > signature
> > > > > > > > > > >> > > > > >> anymore.
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Changes proposed:
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> - *Unpack SubDags during dag parsing*: This
> > > > > rewrites
> > > > > > > the
> > > > > > > > > > >> > > > > *DagBag.bag_dag*
> > > > > > > > > > >> > > > > >> method to unpack subdag while parsing, and
> it
> > > > will
> > > > > > > give a
> > > > > > > > > > flat
> > > > > > > > > > >> > > > > >> structure at
> > > > > > > > > > >> > > > > >> the task level
> > > > > > > > > > >> > > > > >> - *Simplify SubDagOperator*: The new
> > > > SubDagOperator
> > > > > > > acts
> > > > > > > > > > like a
> > > > > > > > > > >> > > > > >> container and most of the original methods
> > are
> > > > > > removed.
> > > > > > > > The
> > > > > > > > > > >> > > > > >> signature is
> > > > > > > > > > >> > > > > >> also changed to *subdag_factory *with
> > > > *subdag_args
> > > > > > *and
> > > > > > > > > > >> > > > > *subdag_kwargs*.
> > > > > > > > > > >> > > > > >> This is similar to the PythonOperator
> > > signature.
> > > > > > > > > > >> > > > > >> - *Add a TaskGroup model and add
> > current_group
> > > &
> > > > > > > > > parent_group
> > > > > > > > > > >> > > > > attributes
> > > > > > > > > > >> > > > > >> to BaseOperator*: This metadata is used to
> > > group
> > > > > > tasks
> > > > > > > > for
> > > > > > > > > > >> > > > > >> rendering at
> > > > > > > > > > >> > > > > >> UI level. It may potentially extend further
> > to
> > > > > group
> > > > > > > > > > arbitrary
> > > > > > > > > > >> > > tasks
> > > > > > > > > > >> > > > > >> outside the context of subdag to allow
> > > > group-level
> > > > > > > > > operations
> > > > > > > > > > >> > > (i.e.
> > > > > > > > > > >> > > > > >> stop/trigger a group of task within the
> dag)
> > > > > > > > > > >> > > > > >> - *Webserver UI for SubDag*: Proposed UI
> > > > > modification
> > > > > > > to
> > > > > > > > > > allow
> > > > > > > > > > >> > > > > >> (un)collapse a group of tasks for a flat
> > > > structure
> > > > > to
> > > > > > > > pair
> > > > > > > > > > with
> > > > > > > > > > >> > > the
> > > > > > > > > > >> > > > > first
> > > > > > > > > > >> > > > > >> change instead of the original hierarchical
> > > > > > structure.
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Please see related documents and PRs for
> > > details:
> > > > > > > > > > >> > > > > >> AIP:
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Original Issue:
> > > > > > > > > > https://github.com/apache/airflow/issues/8078
> > > > > > > > > > >> > > > > >> Draft PR:
> > > > > > https://github.com/apache/airflow/pull/9243
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Please let me know if there are any aspects
> > > that
> > > > > you
> > > > > > > > > > >> > agree/disagree
> > > > > > > > > > >> > > > > >> with or
> > > > > > > > > > >> > > > > >> need more clarification (especially the
> third
> > > > > change
> > > > > > > > > > regarding
> > > > > > > > > > >> > > > > TaskGroup).
> > > > > > > > > > >> > > > > >> Any comments are welcome and I am looking
> > > forward
> > > > > to
> > > > > > > it!
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > > >> Cheers
> > > > > > > > > > >> > > > > >> Bin
> > > > > > > > > > >> > > > > >>
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > --
> > > > > > > > > > >> > > > Kyle Hamlin
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Thanks & Regards
> > > > > > > > > > Poornima
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>