You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/06 23:10:21 UTC

[GitHub] [airflow] ashb commented on a change in pull request #14048: Speed up clear_task_instances by doing a single sql delete for TaskReschedule

ashb commented on a change in pull request #14048:
URL: https://github.com/apache/airflow/pull/14048#discussion_r571501423



##########
File path: airflow/models/taskinstance.py
##########
@@ -166,13 +167,23 @@ def clear_task_instances(
                 ti.max_tries = max(ti.max_tries, ti.prev_attempted_tries)
             ti.state = State.NONE
             session.merge(ti)
+
+        tr_filter.append((ti.dag_id, ti.task_id, ti.execution_date, ti.try_number))
+
+    if tr_filter:
         # Clear all reschedules related to the ti to clear
-        session.query(TR).filter(
-            TR.dag_id == ti.dag_id,
-            TR.task_id == ti.task_id,
-            TR.execution_date == ti.execution_date,
-            TR.try_number == ti.try_number,
-        ).delete()
+        delete_qry = TR.__table__.delete().where(
+            or_(
+                and_(
+                    TR.dag_id == dag_id,
+                    TR.task_id == task_id,
+                    TR.execution_date == execution_date,
+                    TR.try_number == try_number,
+                )
+                for dag_id, task_id, execution_date, try_number in tr_filter

Review comment:
       If these are all for the same dag (which it looks like they _should_ be given the API) then this query would be much quicker if `TR.dag_id = dag.dag_id` is pulled out of the loop so it doesn't appear so often.
   
   (This was an optimization I made to `filter_for_tis`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org