You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/10 17:43:44 UTC

[GitHub] [airflow] andrewgodwin opened a new pull request #18152: Fix stuck "queued" tasks in KubernetesExecutor

andrewgodwin opened a new pull request #18152:
URL: https://github.com/apache/airflow/pull/18152


   There are a set of circumstances where TaskInstances can get "stuck" in the QUEUED state when they are running under KubernetesExecutor, where they claim to have a pod scheduled (and so are queued) but do not actually have one, and so sit there forever.
   
   It appears this happens occasionally with reschedule sensors and now more often with deferrable tasks, when the task instance defers/reschedules and then resumes before the old pod has vanished. It would also, I believe, happen when the Executor hard-exits with items still in its internal queues.
   
   There was a pre-existing method in there to clean up stuck queued tasks, but it only ran once, on executor start. I have modified it to be safe to run periodically (by teaching it not to touch things that the executor looked at recently), and then made it run every so often (30 seconds by default).
   
   This is not a perfect fix - the only real fix would be to have far more detailed state tracking as part of TaskInstance or another table, and re-architect the KubernetesExecutor. However, this should reduce the number of times this happens very signficantly, so it should do for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #18152: Fix stuck "queued" tasks in KubernetesExecutor

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #18152:
URL: https://github.com/apache/airflow/pull/18152


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on a change in pull request #18152: Fix stuck "queued" tasks in KubernetesExecutor

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on a change in pull request #18152:
URL: https://github.com/apache/airflow/pull/18152#discussion_r706362385



##########
File path: .pre-commit-config.yaml
##########
@@ -250,7 +250,7 @@ repos:
         exclude: |
           (?x)
           ^airflow/_vendor/
-  - repo: https://github.com/ikamensh/flynt/
+  - repo: https://github.com/ikamensh/flynt

Review comment:
       This is a local copy of https://github.com/apache/airflow/pull/18151 to let me get the commit in; it will vanish upon rebase when that lands.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] andrewgodwin commented on pull request #18152: Fix stuck "queued" tasks in KubernetesExecutor

Posted by GitBox <gi...@apache.org>.
andrewgodwin commented on pull request #18152:
URL: https://github.com/apache/airflow/pull/18152#issuecomment-918292229


   We've tested this and it appears to fix the problem, so now putting it up for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org