You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/19 15:29:00 UTC

[GitHub] [airflow] koroder opened a new pull request, #27145: Add support for a DAG display name different than dag_id

koroder opened a new pull request, #27145:
URL: https://github.com/apache/airflow/pull/27145

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r999631573


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   I think this require some more thinking.
   Ideally if we have display name for Dags set by the user then dag_id should be automaticly generated by Airflow without user involvment - I think it should be a unique identifier that can not be changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r1005297516


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   This would allow the user to pass in an empty display name, which would look pretty terrible anywhere. I’d make the argument `display_name: str = ""` instead and do `self._display_name = display_name or dag_id` here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1336475090

   Hey @koroder - are you raising the discussion about it as advised ? I think it is an important topic and would be great if it is discussed in the devlist. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] koroder commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
koroder commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1301917023

   I see two ideas floating at this point:
   1. Using `name` as the only parameter; and then generating unique `dag_id` from it. While this makes the interface simpler; but it makes `dag_id` unknown from the users.
   2. To use `display_name` along with `dag_id` as DAG params. While this is a simpler solution on the backend - it needs lots of work on the frontend for a consistent experience.
   
   How do we usually arrive at decisions here? Making my first PR for the airflow project; and would like to understand the process. Will really appreciate your support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1291263756

   We shouldn't make `display_name` the app since we use `dag_id` absolutely everywhere. But then it only appears in some places, and is that just a shorter description?
   I wonder what others think. With `dag_id` remaining the default, where would we want to show `display_name`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1291241773

   I like how simple it is, though I know different people have different opinion and apparenlty I remember hearing strong voices that we should not add "yet another field for that" and that we should change dag_id to allow any characters. 
   
   I think however, that by applying the proposed solution by @koroder  we solve the purely 'display" problem in much easier way - the problem we have with dag_id is that for MySQL notably we are pretty much completely stuck with custom encoding just to fit the limitations of MySQL indexes and solution to that including migration of historical data would be terribly, terribly complex. I thin @ashb you mentioned several times that you would strongly prefer not introducing a new name on top of id (and I understand why), however I think simplicity of the backend side and the migration complexity for Airflow's DAG id might actually make it a much more pragmatic approach which pretty much does not add the "server side" complexity.
   
   This solution is "good enough" IMHO. Solves the problem of displaying national characters, it is very simple, allows to use default database encoding for the name and is very safe to implement. So it gets my approval in terms of the approach/direction.
   
   However I think the display part of it should be done differently IMHO. There are two problems with the way it is done now:
   
   1) The dag_id should remain the main identifier and should be still display the ID as main identifier. Instead the 'display text" should be displayed additionally to it - using different font (maybe bigger when present, maybe the "Display" should be above the id, but it should be clearly visible that the original dag_id is the unique identifier of the DAG. For all practical purposes in all other places where it is referred, dag_id should still be used. The DisplayName should be at most an "annotation", extra information that should be displayed next to dag_id. It might be even with bigger fonts or more prominent, but it should be clear that dag_id is the identifier. 
   
   2) (and this is pretty much complete blocker) If we decide to to this and show both ids, this approach should be applied EVERYWHERE. There are mutliple places where dag_id is displayed. Having display_id on only one "dags" view is just plain wrong. The "id" as "title" - means displaying ID as tooltip is just plain obvious it should not be done this way. IMHO It is next to unusable (what if you want to copy&paste the id for example ??), and terribly inconsistent with the rest of the UI. With the addition of the Datasets recently this is even mor complsx because DAG Ids are appearing also in the charts. The UI details panel also displays DAG_ID in a number of places. 
   
   I think @bbovenzi and @pierrejeambrun - I think if we want to do it this way, this change becomes mostly a UI change. The backend seems super simple to implement, but the UI part will become much more complex - fitting the id and name will change likely layout of a number of views and it will not be super-simple to apply it everywhere consistently (plus the Datasets make it even more complex I think).  Maybe you can state your opinion on the UI side of such a change as well - 
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r1005296303


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   I think what Elad meant is the user should be able to just do something like
   
   ```python
   DAG(name="俺のダッグ")
   ```
   
   and _internally_ Airflow would automatically generate a `dag_id` for it (say with `uuid`). Since (ideally) all user-facing interface would just show the user-friendly name, the auto-generated ID can be basically anything and wouldn’t matter to the user at all. And if the user does care, they can always still pass in a custom `dag_id` explicitly:
   
   ```python
   DAG(dag_id="my_dag", name="俺のダッグ")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r1009013748


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   Yep. We discussed slugifying the name - that can also be an option, however I think adding optional name to be displayed next to id is a better approach - because having a unique id tha tis "known" and does not require to follow the very same slugifying algorithm is better (for any case where Human seeing a good name has to now what unique id is used)
   
   But seems I am in a minority, and it's not something I feel strongly about, so I am ok with any solutiom - somebody hopefully finally will implement it rather than just talking about it (I think this is about 4th discussion about it I recall). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] koroder commented on a diff in pull request #27145: Add support for a DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
koroder commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r1003216203


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   @eladkal Most of the thought process I have taken from the discussion in https://github.com/apache/airflow/issues/22073 where the idea was to use a separate `display_name` supporting non-ascii characters for DAG/Tasks independently from `dag_id` and `task_id`. Directly using them in `dag_id`/`task_id` breaks couple of functionalities which I have referenced in the description. Let me know if we still need to go by a different approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed pull request #27145: Add support for DAG display name different than dag_id

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #27145: Add support for DAG display name different than dag_id
URL: https://github.com/apache/airflow/pull/27145


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #27145: Add support for a DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r999631573


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   I think this require some more thinking.
   Ideally if we have display name for Dags set bu the user than dag_id should be automaticly generated by Airflow without user involvment - I think it should be a unique identifier that can not be changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1306367337

   raise it to devlist and ask for voting if you cannot reach consensus. https://www.apache.org/foundation/voting.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #27145: Add support for a DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1284196785

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] koroder commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
koroder commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1358924000

   > Hey @koroder - are you raising the discussion about it as advised ? I think it is an important topic and would be great if it is discussed in the devlist.
   
   Sorry for the delay in coming back on this. I will be raising it on the dev list next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #27145: Add support for DAG display name different than dag_id

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27145:
URL: https://github.com/apache/airflow/pull/27145#issuecomment-1416548350

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #27145: Add support for DAG display name different than dag_id

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #27145:
URL: https://github.com/apache/airflow/pull/27145#discussion_r1005304964


##########
airflow/models/dag.py:
##########
@@ -451,6 +453,7 @@ def __init__(
             max_active_tasks = concurrency
         self._max_active_tasks = max_active_tasks
         self._pickle_id: int | None = None
+        self._display_name = dag_id if display_name is None else display_name

Review Comment:
   Exactly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org