You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/06 15:27:40 UTC

[GitHub] [airflow] ashb commented on a diff in pull request #23846: Do not fail requeued TIs

ashb commented on code in PR #23846:
URL: https://github.com/apache/airflow/pull/23846#discussion_r890269996


##########
airflow/jobs/scheduler_job.py:
##########
@@ -664,7 +663,20 @@ def _process_executor_events(self, session: Session = None) -> int:
                 ti.pid,
             )
 
-            if ti.try_number == buffer_key.try_number and ti.state == State.QUEUED:
+            # There are two scenarios why the same TI with the same try_number is queued
+            # after executor is finished with it:
+            # 1) the TI was killed externally and it had no time to mark itself failed
+            # - in this case we should mark it as failed here.
+            # 2) the TI has been requeued after getting deferred - in this case either our executor has it
+            # or the TI is queued by another job. Either ways we should not fail it.
+
+            # All of this could also happen if the state is "running",
+            # but that is handled by the zombie detection.
+
+            ti_queued = ti.try_number == buffer_key.try_number and ti.state == TaskInstanceState.QUEUED
+            ti_requeued = ti.queued_by_job_id != self.id or self.executor.has_task(ti)

Review Comment:
   There is also a `queued_dttm` column -- is it worth checking that value is "recent"? (I don't know the answer, just asking questions)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org