You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/09 13:52:44 UTC

[GitHub] [airflow] dstandish commented on a diff in pull request #24908: Dataset event table

dstandish commented on code in PR #24908:
URL: https://github.com/apache/airflow/pull/24908#discussion_r917270799


##########
airflow/models/dataset.py:
##########
@@ -199,3 +199,86 @@ def __repr__(self):
         for attr in [x.name for x in self.__mapper__.primary_key]:
             args.append(f"{attr}={getattr(self, attr)!r}")
         return f"{self.__class__.__name__}({', '.join(args)})"
+
+
+class DatasetEvent(Base):
+    """
+    A table to store datasets events.
+
+    :param dataset_id: reference to Dataset record
+    :param extra: JSON field for arbitrary extra info
+    :param source_task_id: the task_id of the TI which updated the dataset
+    :param source_dag_id: the dag_id of the TI which updated the dataset
+    :param source_run_id: the run_id of the TI which updated the dataset
+    :param source_map_index: the map_index of the TI which updated the dataset
+
+    We use relationships instead of foreign keys so that dataset events are not deleted even
+    if the foreign key object is.
+    """
+
+    id = Column(Integer, primary_key=True, autoincrement=True)
+    dataset_id = Column(Integer, nullable=False)
+    extra = Column(ExtendedJSON, nullable=True)
+    source_task_id = Column(StringID(), nullable=True)
+    source_dag_id = Column(StringID(), nullable=True)
+    source_run_id = Column(StringID(), nullable=True)
+    source_map_index = Column(Integer, nullable=True, server_default=text("-1"))

Review Comment:
   Yeah, it's a good question.  So with dataset events, there tend to be multiple things implicated.  There's the task that updated the dataset.  But then there's the also dag or task that depends on the dataset, and will be triggered by the the update.  And e.g. with the queue table, we definitely needed to make clear we're talking about the target dag id.  Here though, I think you're probably right, we probably don't need it because it's only about writes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org