You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/01 17:57:12 UTC

[GitHub] [airflow] andrewgodwin opened a new pull request #15142: Add support for labelling DAG edges (WIP)

andrewgodwin opened a new pull request #15142:
URL: https://github.com/apache/airflow/pull/15142


   This adds support for putting human-readable labels on edges in the DAG
   between Tasks, as well as making the underlying framework for that
   generic enough that future metadata could be added if desired.
   
   This is not quite complete, but I'm putting it up to get some feedback on the code and API design to ensure it feels right. What's left to do:
   
   - [x] Add support for XComArgs
   - [ ] Add either support or a nice error for TaskGroups
   - [ ] Add tests
   - [ ] Add documentation
   
   closes: #15140 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryanahamilton commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
ryanahamilton commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-816355063


   I opened #15298 to address the aforementioned visual enhancements needed for this feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812628196


   The idea for labels is merely to give the user a visual indication of what each edge means - this can be especially important, IMO, when using the branching operators or similar, on larger dags. It has no runtime effect.
   
   I don't think I'd even want to add anything with a runtime effect on edges; Airflow is designed around all the runtime info (priority, etc.) being at the Task level and I like that. The only things I could forsee adding is other informational data, such as a longer "description" field.
   
   The other option to achieve this is to take this, task groups, and any other informational-only parts of airflow and spin them off into a separate presentation layer somehow, and while I do quite like separating presentation and logic, I think that would be too unwieldy to actually be useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812612267


   > I'm also not sure you could make this work:
   > 
   > ```
   > [t1, t2, t3] >> "label" >> [t4, t5, t6]
   > ```
   
   I don’t think `[t1, t2, t3] >> [t4, t5, t6]` works either, so that’s not a big problem IMO. But I am currently in camp explicit `Label` now although I proposed the `str` syntax; there are things a plain `str` can’t handle, so a class will be needed sooner or later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812082998


   To add to your Todo list: update the SerialisedDag representations to include this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812078553


   > `from airflow.models import Label`
   > 
   > I'm not a fan of this import for in user dags (they generally don't import from models currently) but I don't have an immediate other suggestion
   
   Yeah, I went trawling around some example DAGs to find what path might fit, and I saw a couple with `models` imports for `Variable` and so forth, so I went with that initially. I'm not sure what `airflow.task` is for - seems pretty empty - but that was something else that felt feasible.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812326424


   > If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   
   Can you give an example of a long description that has to use `Label` but not otherwise?
   
   How about `task1 >> "this is a label" | task2`? This is the exact opposite of the Beam Python SDK [example here](https://beam.apache.org/documentation/sdks/python-streaming/). 
   
   Though I am a bit hesitant to introduce another special operator for Airflow, I think it's still better rather than introducing a `Label` inheriting from `TaskMixin` for these 2 reasons:
   
   - Conceptually, label and task should not be treated the same. The label is metadata but the task is a task. They are different in both UI and execution.
   - In general, I don't like the *explicit* import of UI components in DAG code. I think DAG code should focus on execution/dependency logic
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryanahamilton commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
ryanahamilton commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r610653678



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1137,6 +1136,30 @@ TaskGroup can be created using ``@task_group`` decorator, it takes one argument
     :end-before: [END howto_task_group_decorator]
 
 
+Edge Labels
+===========
+
+As well as grouping tasks into groups, you can also label the edges between
+different tasks in the Graph View - this can be especially useful for branching
+areas of your DAG, so you can label the conditions under which certain branches
+might run.

Review comment:
       @andrewgodwin can you also add some Edge Labels to a few of the DAGs in `airflow/example_dags/` in the same PR? It would be useful for this feature to be exposed during local development.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812077444


   ```from airflow.models import Label```
   
   I'm not a fan of this import for in user dags (they generally don't import from models currently) but I don't have an immediate other suggestion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r610259541



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1137,6 +1136,30 @@ TaskGroup can be created using ``@task_group`` decorator, it takes one argument
     :end-before: [END howto_task_group_decorator]
 
 
+Edge Labels
+===========
+
+As well as grouping tasks into groups, you can also label the edges between
+different tasks in the Graph View - this can be especially useful for branching
+areas of your DAG, so you can label the conditions under which certain branches
+might run.

Review comment:
       Don't know how I didn't think about that one given it's an _entirely visual feature!_ I'll get it in tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r607143797



##########
File path: airflow/serialization/schema.json
##########
@@ -102,7 +102,8 @@
         "_task_group": {"anyOf": [
           { "type": "null" },
           { "$ref": "#/definitions/task_group" }
-        ]}
+        ]},
+        "edge_info": { "$ref": "#/definitions/dict" }

Review comment:
       Sure, I'll add a slightly more compact representation. I can't save any actual data, since source-target-label is all that's stored, but we can save the outside wrapping with `_type` etc. at least?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman merged pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
dimberman merged pull request #15142:
URL: https://github.com/apache/airflow/pull/15142


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryanahamilton commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
ryanahamilton commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-816187909


   There are a couple of visual tweaks we should add when hovering tasks or statuses (e.g. fading out labels ). I'll provide you a solution as I was just modifying these interactions in #15257.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812326424


   > If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   
   Can you give an example of a long description that has to use `Label` for but not otherwise?
   
   How about `task1 >> "this is a label" | task2`? This is the exact opposite of the Beam Python SDK [example here](https://beam.apache.org/documentation/sdks/python-streaming/). 
   
   Though I am a bit hesitant to introduce another special operator for Airflow, I think it's still better to introduce a `Label` inheriting from `TaskMixin` for these 2 reasons:
   
   - Conceptually, label and task should not be treated the same. The label is metadata but the task is a task. They are different in both UI and execution.
   - In general, I don't like the *explicit* import of UI components in DAG code. I think DAG code should focus on execution/dependency logic
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812544316


   I would argue that this fulfills a similar role as task groups - it's a primarily UI-focused feature, but it's also DAG metadata and should live in it. If it's not in the DAG, where would it go where it can appear both in the webserver view and in the output from `airflow dags show`?
   
   I'm open to changing the user interface here - this is just a rough idea me and @ashb discussed - but I really don't like moving to just a plain `str` as the thing we use with `>>` as it's going to make the code _significantly_ more complex to track where everything is supposed to go, especially when combined with XComArgs (which are the model I based this on). Not to mention all the typing for `>>` expects something inheriting from TaskMixin right now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-816009849


   [The Workflow run](https://github.com/apache/airflow/actions/runs/730353838) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r609972720



##########
File path: airflow/www/templates/airflow/graph.html
##########
@@ -683,10 +683,18 @@
           source_id = map_task_to_node.get(edge.source_id)
           target_id = map_task_to_node.get(edge.target_id)
           if(source_id != target_id && !g.hasEdge(source_id, target_id))
-            g.setEdge(source_id, target_id, {
-              curve: d3.curveBasis,
-              arrowheadClass: 'arrowhead',
-            });
+            if (edge.label != undefined) {
+              g.setEdge(source_id, target_id, {
+                curve: d3.curveBasis,
+                arrowheadClass: 'arrowhead',
+                label: edge.label
+              });
+            } else {
+              g.setEdge(source_id, target_id, {
+                curve: d3.curveBasis,
+                arrowheadClass: 'arrowhead',
+              });
+            }

Review comment:
       👍 Wasn't sure if I could, but great, I'll do that!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812113661


   > `from airflow.models import Label`
   
   How about hiding the `Label` class from the user entirely? Something like `task1 << "Label" << task2` and create the label automatically in `_set_relatives()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
dimberman commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r610327355



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1137,6 +1136,30 @@ TaskGroup can be created using ``@task_group`` decorator, it takes one argument
     :end-before: [END howto_task_group_decorator]
 
 
+Edge Labels
+===========
+
+As well as grouping tasks into groups, you can also label the edges between
+different tasks in the Graph View - this can be especially useful for branching
+areas of your DAG, so you can label the conditions under which certain branches
+might run.

Review comment:
       I'm gonna merge this PR so we don't end up in CI hell, but @andrewgodwin please make another PR with this screenshot if you think that will help




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryanahamilton commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
ryanahamilton commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r609935615



##########
File path: airflow/www/templates/airflow/graph.html
##########
@@ -683,10 +683,18 @@
           source_id = map_task_to_node.get(edge.source_id)
           target_id = map_task_to_node.get(edge.target_id)
           if(source_id != target_id && !g.hasEdge(source_id, target_id))
-            g.setEdge(source_id, target_id, {
-              curve: d3.curveBasis,
-              arrowheadClass: 'arrowhead',
-            });
+            if (edge.label != undefined) {
+              g.setEdge(source_id, target_id, {
+                curve: d3.curveBasis,
+                arrowheadClass: 'arrowhead',
+                label: edge.label
+              });
+            } else {
+              g.setEdge(source_id, target_id, {
+                curve: d3.curveBasis,
+                arrowheadClass: 'arrowhead',
+              });
+            }

Review comment:
       This should be fine without the `undefined` check:
   ```suggestion
               g.setEdge(source_id, target_id, {
                 curve: d3.curveBasis,
                 arrowheadClass: 'arrowhead',
                 label: edge.label,
               });
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-816313757


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812087623


   > > To add to your Todo list: update the SerialisedDag representations to include this
   > 
   > That's actually done already, or it wouldn't make it across to the UI. Unless there's extra work above and beyond just making it appear in the schema & json?
   
   Nope, I just missed it then! GitHub mobile is not the best at displaying PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
turbaszek commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812618291


   When doing operators for adding labels it would be good to check if they work with lineage operators (`>` and `<` afaik).
   
   @andrewgodwin would you mind elaborating more about the use case for the labels? In my understanding all the actions are performed by tasks and edges in DAGs represent only order (and sometimes data) relation. Adding a label would suggest that some action is performed between tasks. Is it done by XCom or something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812144541


   > > `from airflow.models import Label`
   > 
   > How about hiding the `Label` class from the user entirely? Something like `task1 << "Label" << task2` and create the label automatically in `_set_relatives()`.
   
   I'd not be a huge fan of that design for two reasons:
   * If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   * It's relatively unobvious to new people reading the code what the meaning of a plain string is, whereas I think wrapping it in `Label` gives it immediate meaning to the casual reader


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r610207713



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1137,6 +1136,30 @@ TaskGroup can be created using ``@task_group`` decorator, it takes one argument
     :end-before: [END howto_task_group_decorator]
 
 
+Edge Labels
+===========
+
+As well as grouping tasks into groups, you can also label the edges between
+different tasks in the Graph View - this can be especially useful for branching
+areas of your DAG, so you can label the conditions under which certain branches
+might run.

Review comment:
       nit: Probably a screenshot in this doc might be helpful




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812326424


   How about `task1 >> "this is a label" | task2`? This is the exact opposite of the Beam Python SDK [example here](https://beam.apache.org/documentation/sdks/python-streaming/). 
   
   Though I am a bit hesitant to introduce another special operator for Airflow, I think it's still better to introduce a `Label` inheriting from `TaskMixin` for 1.5 reasons:
   
   - Conceptually, label and task should not be treated the same. The label is metadata but the task is a task. They are different in both UI and execution.
   - In general, I don't like the *explicit* import of UI components in DAG code. I think DAG code should focus on execution/dependency logic
   
   > If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   
   Can you give an example of a long description that has to use `Label` for but not otherwise?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
turbaszek edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812618291


   When doing operators for adding labels it would be good to check if they work with lineage operators (`>` and `<` afaik).
   
   @andrewgodwin would you mind elaborating more about the use case for the labels? In my understanding all the actions are performed by tasks and edges in DAGs represent only order (and sometimes data) relation. Adding a label to edge would suggest that some action is performed between tasks. Is it done by XCom or something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812326424


   > If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   
   Can you give an example of a long description that has to use `Label` but not otherwise?
   
   How about `task1 >> "this is a label" | task2`? This is the exact opposite of the Beam Python SDK [example here](https://beam.apache.org/documentation/sdks/python-streaming/). 
   
   Though I am a bit hesitant to introduce another special operator for Airflow, I think it's still better to introduce a `Label` inheriting from `TaskMixin` for these 2 reasons:
   
   - Conceptually, label and task should not be treated the same. The label is metadata but the task is a task. They are different in both UI and execution.
   - In general, I don't like the *explicit* import of UI components in DAG code. I think DAG code should focus on execution/dependency logic
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812544316






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r605859767



##########
File path: airflow/serialization/schema.json
##########
@@ -102,7 +102,8 @@
         "_task_group": {"anyOf": [
           { "type": "null" },
           { "$ref": "#/definitions/task_group" }
-        ]}
+        ]},
+        "edge_info": { "$ref": "#/definitions/dict" }

Review comment:
       Oh this could maybe be specified a bit tighter. WDYT @kaxil ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812077444


   ```from airflow.models import Label```
   
   I'm not a fan of this import for in user dags (they generally don't import from models currently) but I don't have an immediate other suggestion.
   
   (Just a comment on "user" import not code location)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-815998707


   (ignore it being ready for review, I clicked that before remembering the serialisation needs fixing)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812086673


   > To add to your Todo list: update the SerialisedDag representations to include this
   
   That's actually done already, or it wouldn't make it across to the UI. Unless there's extra work above and beyond just making it appear in the schema & json?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r607138030



##########
File path: airflow/models/edgemodifier.py
##########
@@ -0,0 +1,126 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import Sequence, Union
+
+from airflow.models.taskmixin import TaskMixin
+
+
+class EdgeModifier(TaskMixin):

Review comment:
       Interesting, I was going off of `XComArg` having its own file rather than being in `xcom.py`, but happy to move it. Maybe baseoperator, maybe it should live alongside `taskgroups` in utils since it serves a similar UI-only purpose?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812628196


   The idea for labels is merely to give the user a visual indication of what each edge means - this can be especially important, IMO, when using the branching operators or similar, on larger dags. It has no runtime effect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #15142: Add support for labelling DAG edges

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-816837675


   I've opened https://github.com/apache/airflow/pull/15310 to get the extra examples in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#discussion_r606464589



##########
File path: airflow/models/edgemodifier.py
##########
@@ -0,0 +1,126 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from typing import Sequence, Union
+
+from airflow.models.taskmixin import TaskMixin
+
+
+class EdgeModifier(TaskMixin):

Review comment:
       It isn't a model so we should probably move it to some other place instead of inside `airflow.models` -- probably inside `airflow/models/baseoperator.py` where we also have `set_upstream` and `set_downstream`?

##########
File path: airflow/serialization/schema.json
##########
@@ -102,7 +102,8 @@
         "_task_group": {"anyOf": [
           { "type": "null" },
           { "$ref": "#/definitions/task_group" }
-        ]}
+        ]},
+        "edge_info": { "$ref": "#/definitions/dict" }

Review comment:
       Yea similar to how we only store `_downstream_task_ids` (not the `_upstream_task_ids`) for a task and we can set upstream for the other when de-serializing. We do that to keep the size of serialized blog as small as possible.
   
   
   
   https://github.com/apache/airflow/blob/8cc8d11fb87d0ad5b3b80907874f695a77533bfa/airflow/serialization/serialized_objects.py#L451-L452
   
   https://github.com/apache/airflow/blob/8cc8d11fb87d0ad5b3b80907874f695a77533bfa/airflow/serialization/serialized_objects.py#L666-L668
   
   https://github.com/apache/airflow/blob/8cc8d11fb87d0ad5b3b80907874f695a77533bfa/airflow/serialization/serialized_objects.py#L724-L727
   
   https://github.com/apache/airflow/blob/8cc8d11fb87d0ad5b3b80907874f695a77533bfa/airflow/models/baseoperator.py#L1365-L1370
   
   We pay the price of hardcoding it but it is worth it as we can save MBs (which would be transmitted to and from the database) when number of tasks is huge.
   
   

##########
File path: airflow/utils/types.py
##########
@@ -15,6 +15,13 @@
 # specific language governing permissions and limitations
 # under the License.
 import enum
+from typing import Optional
+
+try:
+    from typing import TypedDict
+except ImportError:
+    # Remove this fallback once our minumum version is 3.8
+    from mypy_extensions import TypedDict

Review comment:
       ```suggestion
   from airflow.typing_compat import TypedDict
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on pull request #15142: Add support for labelling DAG edges (WIP)

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on pull request #15142:
URL: https://github.com/apache/airflow/pull/15142#issuecomment-812326424


   How about `task1 >> "this is a label" | task2`? This is the exact opposite of the Beam Python SDK [example here](https://beam.apache.org/documentation/sdks/python-streaming/). 
   
   Though I am a bit hesitant to introduce another special operator for Airflow, I think it's still better to introduce a `Label` inheriting from `TaskMixin` for these 2 reasons:
   
   - Conceptually, label and task should not be treated the same. The label is metadata but the task is a task. They are different in both UI and execution.
   - In general, I don't like the *explicit* import of UI components in DAG code. I think DAG code should focus on execution/dependency logic
   
   > If we ever want to add edge information beyond just a label (e.g. add a longer description), it means we have to go back to this pattern
   
   Can you give an example of a long description that has to use `Label` for but not otherwise?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org