You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/13 01:59:20 UTC

[GitHub] [airflow] dstandish opened a new pull request, #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

dstandish opened a new pull request, #25013:
URL: https://github.com/apache/airflow/pull/25013

   In the DAGs page, in place of schedule interval, show `dataset-triggered` to indicate there is no "schedule interval" per se but this is what will determine the next run.  Update tooltip also (derived from dag_model.timetable_description).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r919620978


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   ```suggestion
                   orm_dag.schedule_interval = 'Dataset Triggered'
   ```
   
   Should we make this more human oriented?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920294927


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   Perhaps `timetable_summary` since that’s where the value comes from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920220328


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   Yeah. Sounds like a good idea. And maybe worth considering renaming to "schedule short name" and "schedule long name" or rather than interval and description? (timetables don't necessarily have an interval)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on PR #25013:
URL: https://github.com/apache/airflow/pull/25013#issuecomment-1183543559

   > Does it make sense to continue calling the column "Schedule" if some rows might be triggered by only dataset events? Or should we do something like this:
   
   Yes, I think so. And not always having to prefix with `schedule:` also makes sense.  I think we will need a limit on the number of dataset_ids we show here. Maybe after 3 we just say `Dataset: 4 datasets`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920229370


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   i'm talking about the dag orm --- are you suggesting `schedule_description` and `schedule_label`?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920222668


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   I think "description" is fine. The other could be "label"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920214965


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   After AIP-39 we now only use `dag.schedule_interval` for display. (I wonder if it’d be cleaner to create a custom timetable for dataset-triggered DAGs, this would make this part easier to follow and have less special cases.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920188556


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   If this is just a label to display, I'd say so. But if we plan to use it anywhere else maybe it should stay as-is/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25013:
URL: https://github.com/apache/airflow/pull/25013#issuecomment-1183553844

   Re changing the column name from "schedule" to trigger", i think it's  a reasonable thing to explore and consider but don't want to take that  on here.  I just want to add a quick messaging fix.  And in the meantime, I don't think that conceptially we're in such a terrible, contradictory place.  Even a timetable, could have a very irregular scheduling cadence, could even be entirely random.  We can think of "scheduling" as the rules by which it's determined when a dag is set to run.  So the run for a dataset-triggered dag is "scheduled" when its upstream datasets are updated -- it's "scheduling" is determined by dataset events.  @blag could you possibly  create a card in the project to represent this concern as a work item? And we can pick it up as availability and priority dictate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920220328


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   Yeah. Sounds like a good idea. And maybe worth considering renaming to "schedule short name" and "schedule long name" or rather than interval and description? (timetables don't necessarily have an interval)
   
   (In the ORM /db)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920188556


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   If this is just a label to display, I'd say so. But if we plan to use it anywhere else maybe it should stay as-is.



##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   If this is just a label to display, I'd say so. But if we plan to use it anywhere else, it should stay as-is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
blag commented on PR #25013:
URL: https://github.com/apache/airflow/pull/25013#issuecomment-1183536355

   Does it make sense to continue calling the column "Schedule" if some rows might be triggered by only dataset events? Or should we do something like this:
   
   | Trigger | Last Run | Next Run |
   | ------- | --------- | --------- |
   | `Schedule: @daily` | ... | ... |
   | (none) |          |            |
   | `Dataset: <dataset-uri>` | ... |      |
   
   I can also see a use case for having a DAG be triggered by both dataset events and by schedule events, like refining an ML model every night and applying that model to live data the next day as it passes through. So a DAG trigger could be displayed as:
   
   `Schedule: @daily`, `Dataset: <dataset-uri>`
   
   If we're worried about the `Schedule: ` or `Dataset: ` prefixes causing unnecessary visual noise, we can simply elide them if all displayed DAG triggers are from the same source (eg: if all DAGs are triggered on schedules, just display `@daily`, with no `Schedule: ` prefix).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish merged pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
dstandish merged PR #25013:
URL: https://github.com/apache/airflow/pull/25013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #25013: Show 'dataset-triggered' as schedule interval for dataset-triggered dags

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #25013:
URL: https://github.com/apache/airflow/pull/25013#discussion_r920294927


##########
airflow/models/dag.py:
##########
@@ -2495,11 +2495,15 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=NEW_SESSION):
             orm_dag.last_parsed_time = timezone.utcnow()
             orm_dag.default_view = dag.default_view
             orm_dag.description = dag.description
-            orm_dag.schedule_interval = dag.schedule_interval
             orm_dag.max_active_tasks = dag.max_active_tasks
             orm_dag.max_active_runs = dag.max_active_runs
             orm_dag.has_task_concurrency_limits = any(t.max_active_tis_per_dag is not None for t in dag.tasks)
-            orm_dag.timetable_description = dag.timetable.description
+            if dag.schedule_on:
+                orm_dag.schedule_interval = 'dataset-triggered'

Review Comment:
   _Description_ is already used (it’s the thing in the tooltip popup). Perhaps `timetable_summary` since that’s where the value comes from?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org