You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "bolkedebruin (via GitHub)" <gi...@apache.org> on 2023/02/20 09:21:55 UTC

[GitHub] [airflow] bolkedebruin commented on a diff in pull request #29433: Add dataset update endpoint

bolkedebruin commented on code in PR #29433:
URL: https://github.com/apache/airflow/pull/29433#discussion_r1111675033


##########
airflow/datasets/manager.py:
##########
@@ -55,23 +61,33 @@ def register_dataset_change(
         dataset_model = session.query(DatasetModel).filter(DatasetModel.uri == dataset.uri).one_or_none()
         if not dataset_model:
             self.log.warning("DatasetModel %s not found", dataset)
-            return
-        session.add(
-            DatasetEvent(
+            return None
+
+        if task_instance:
+            dataset_event = DatasetEvent(
                 dataset_id=dataset_model.id,
                 source_task_id=task_instance.task_id,
                 source_dag_id=task_instance.dag_id,
                 source_run_id=task_instance.run_id,
                 source_map_index=task_instance.map_index,
                 extra=extra,
             )
-        )
+        else:
+            # When an external dataset change is made through the API, it isn't triggered by a task instance,
+            # so we create a DatasetEvent without the task and dag data.
+            dataset_event = DatasetEvent(

Review Comment:
   It would be great to have extra information available when the dataset has externally changed such as:
   
   * by whom - `external_auth_id` or `external_service_id` -> required
   * from where (api, client_ip / remote_addr) - `external_source` -> required
   * the timestamp of the actual event - so it can be reconciled if required -> Nullable as it might not be available
   
   This ensures lineage isn't broken across systems



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org