You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "michaelmicheal (via GitHub)" <gi...@apache.org> on 2023/02/10 14:18:49 UTC

[GitHub] [airflow] michaelmicheal commented on a diff in pull request #29441: datasets, next_run_datasets, remove unnecessary timestamp filter

michaelmicheal commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1102819882


##########
airflow/www/views.py:
##########
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
                     DatasetEvent,
                     and_(
                         DatasetEvent.dataset_id == DatasetModel.id,
-                        DatasetEvent.timestamp > DatasetDagRunQueue.created_at,

Review Comment:
   > However, I would also remove the and_ around it since then there would only be one filter condition in that join:
   
   Yes, you're right the `and` becomes unnecessary. 
   
   I think there might be some confusion around DDRQ. My understanding is that when a `DatasetEvent` is created, a DDRQ record is created per consuming DAG. Then, once a DAG has an associated DDRQ record for each `Dataset` that it depends on, a dag_run is created and then all DDRQ records associated with that DAG are deleted. 
   
   > If you go for option 2, I think you should be able to compare the existence and creation time of the DDRQ with the DatasetEvent timestamp to figure out whether or not the last update time has already triggered a DDRQ/DagRun or if it has partially satisfied the conditions of a future DagRun.
   
   As I understand it, if there are DDRQ records for a DAG,  we can assume that there hasn't been a DagRun triggered since the last DatasetEvent (because we delete DDRQ records on the creation of a DagRun).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org