You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/30 17:23:44 UTC

[GitHub] [airflow] dstandish commented on a diff in pull request #25959: Ensure stale dataset references are removed

dstandish commented on code in PR #25959:
URL: https://github.com/apache/airflow/pull/25959#discussion_r958747290


##########
tests/models/test_dag.py:
##########
@@ -692,14 +692,14 @@ def test_bulk_write_to_db(self):
                 assert row[0] is not None
 
         # Re-sync should do fewer queries
-        with assert_queries_count(8):
+        with assert_queries_count(16):

Review Comment:
   so, before, we only needed look at the python object to determine whether we needed to do any data operations.  if there were dataset references, we'd add them.
   
   however we need to also _remove_ them from database if they were there before but are no longer on the python objects.
   
   so that means even for dags and tasks that don't have any references, we have to check if they did _before_.  and for ones that _do_ have references, we have to check that there aren't any in the DB that need to be deleted.
   
   so given that, it makes sense that queries would increase.  now, whether the current approach is _optimal_ from a performance perspective, that i'm not sure of, but i wanted to at least get it working in a reasonable way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org