You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/02 21:08:37 UTC

[GitHub] [airflow] dimberman opened a new issue #8078: Execute SubDAG tasks as part of parent DAG

dimberman opened a new issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078
 
 
   **Description**
   
   Currently, the SubDagOperator launches a completely different DAG and then monitors it as a separate entity. This has lead to all sorts of edge case (e.g. when workers have different executors than the scheduler).
   
   This Issue suggests that we instead merge all tasks from the subdag into the original DAG with an extra "original_dag" label. this means at the UI level we can still condense all subdag tasks into a single task, but a single scheduler will still perform all task launching.
   
   **Use case / motivation**
   
   We want SubDag tasks to act exactly the same as non-subdag tasks
   
   **Related Issues**
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608094639
 
 
   Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.
   
   I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the `dag factory` to return a list of tasks. While the `SubDagOperator` (this become more like a `SubTasksOperator` in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608104764
 
 
   > @kaxil I think I need permission to create an AIP at Confluence. Can you give me permission?
   
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610210400
 
 
   > It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?
   
   I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat `subdag` as in `subgraph to graph` rather than a vertex as right now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610189799
 
 
   @kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft.  It fixed the `SubDagOperator` by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of `SubDagOperator`. 
   
   It seems to me that the problem has been fixed? Should I still proceed to finish the AIP draft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610370103
 
 
   > > It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?
   > 
   > I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat `subdag` as in `subgraph to graph` rather than a vertex as right now.
   
   Yes, please do 👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608097965
 
 
   > Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.
   > 
   > I actually have a more simplistic thought. Instead of returning a DAG from the `dag factory`, we can just ask the `dag factory` to return a list of tasks. While the `SubDagOperator` (this become more like a `SubTasksOperator` in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.
   
   I like the idea.
   
   >@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?
   
   No that should work fine

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610189799
 
 
   @kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft.  It fixed the `SubDagOperator` by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of `SubDagOperator`. 
   
   It seemed to me that the problem has been fixed? Should I proceed to finish the AIP draft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610189799
 
 
   @kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft.  It fixed the `SubDagOperator` by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of `SubDagOperator`. 
   
   It seems to me that the problem has been fixed? Is it still useful for the suggested change and proceed to finish the AIP draft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608091070
 
 
   +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608092635
 
 
   @kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608104379
 
 
   @xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610189799
 
 
   @kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft.  It fixed the `SubDagOperator` by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of `SubDagOperator`. 
   
   It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608103502
 
 
   @kaxil  I think I need permission to create an AIP at Confluence. Can you give me permission? My email is `bin.huangxb@gmail.com`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608103502
 
 
   @kaxil  I think I need permission to create an AIP at Confluence. Can you give me permission? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608100783
 
 
   > > @kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the `SubDagOperator` or even renaming/removing it.
   > 
   > Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.
   
   Sounds good. I will draft an AIP within this week and open a thread for discussion.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608094639
 
 
   Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.
   
   I actually have a more simplistic thought. Instead of returning a DAG from the `dag factory`, we can just ask the `dag factory` to return a list of tasks. While the `SubDagOperator` (this become more like a `SubTasksOperator` in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610189799
 
 
   @kaxil @turbaszek @dimberman I found this PR https://github.com/apache/airflow/pull/5498 when I was working on the AIP draft.  It fixed the `SubDagOperator` by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of `SubDagOperator`. 
   
   It seemed to me that the problem has been fixed? Should I still proceed to finish the AIP draft?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang removed a comment on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang removed a comment on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608103502
 
 
   @kaxil  I think I need permission to create an AIP at Confluence. Can you give me permission? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608100360
 
 
   > @kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the `SubDagOperator` or even renaming/removing it.
   
   Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608091499
 
 
   I think it's totally reasonable to say that SubDag separation should be more of a UI feature than an execution feature, so I think if we can identify which tasks are "subdag" tasks, we can condense them in the UI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608094639
 
 
   Yeah, totally agree. I don't think the added complexity with nested dags gives more errors than values.
   
   I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the `dag factory` to return a list of tasks. While the `SubDagOperator` (this become more like a `SubTasksOperator` in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608090598
 
 
   @xinbinhuang 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-608099758
 
 
   @kaxil @dimberman  Do we need an AIP for this? This seems to change the whole behavior of the `SubDagOperator` or even renaming/removing it. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #8078: Execute SubDAG tasks as part of parent DAG
URL: https://github.com/apache/airflow/issues/8078#issuecomment-610210400
 
 
   I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat `subdag` as in `subgraph to graph` rather than a vertex as right now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services