You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/01 05:39:45 UTC

[GitHub] [airflow] dstandish commented on a change in pull request #19904: Fix labels used to find queued KubeExecutor pods

dstandish commented on a change in pull request #19904:
URL: https://github.com/apache/airflow/pull/19904#discussion_r759868139



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -473,7 +472,7 @@ def clear_not_launched_queued_tasks(self, session=None) -> None:
             base_label_selector = (
                 f"dag_id={pod_generator.make_safe_label_value(task.dag_id)},"
                 f"task_id={pod_generator.make_safe_label_value(task.task_id)},"
-                f"airflow-worker={pod_generator.make_safe_label_value(str(self.scheduler_job_id))}"
+                f"airflow-worker={pod_generator.make_safe_label_value(str(task.queued_by_job_id))}"

Review comment:
       Thanks
   
   > Note: It's using the queued_by_job_id from the TI, which will match the label. Eventually the pod will be adopted by another (or the new) scheduler, however, this can run before that happens (it's on an interval after all) AND this does run before adoption when a scheduler starts.
   
   OK so you are saying that this process (i.e. read TI from db, and use it to build the labels, then search for the pod),  generally speaking, will happen before the task would e.g. be failed and retried (at which time presumably it would get a _new_ queued by job id, after which if the pod was still out there it would no longer be found in this way).  If I have you right, then that makes sense and looks good.
   
   > This becomes important if we consider a shared namespace with multiple Airflow worker pods in it. It becomes even more important if we have the same dags/tasks/scheduled runs
   
   Makes sense.  But that would not sound like a good idea!  
   
   Separate note... `airflow-worker` is a misnomer right?  That makes it sound like celery worker... though i have also seen it used to refer to k8s exec task pods.... and it's not _that_ either....
   
   But yeah i mean now that you mention it... scheduler job ids are probably just incrementing integers no? In that case it would not be the most rock solid of selectors in the shared namespace scenario. Though i'm sure would generally be very rare, maybe there should be a random cluster identifier created somewhere in the db with `airflow db init` that could be used as a more direct way of keeping things separate 
   
    
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org