You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/01/15 04:19:37 UTC

[GitHub] ramandumcs commented on issue #4434: [AIRFLOW-3516] Support to create k8 worker pods in batches

ramandumcs commented on issue #4434: [AIRFLOW-3516] Support to create k8 worker pods in batches
URL: https://github.com/apache/airflow/pull/4434#issuecomment-454262283
 
 
   Thanks @dimberman for looking in to this PR.
   
   As per current implementation K8 Executor submits/creates one k8 worker pod per scheduler loop. 
   Scheduler creates a single k8 worker pod inside self.executor.heartbeat() function of jobs.py file. In self.executor.heartbeat() function it calls self.sync() which submits/creates single k8 worker pod. 
   task_queue might have 100s of tasks to be run but only one task gets submitted per scheduler loop which increases the task scheduling latency/delay.
   
   Each scheduling loop takes a minimum of 1 second so scheduling latency of last task in the task queue with 1000 tasks will be atleast 1000 seconds.
   (We have also observed scheuling loop sometimes taking ~ 2 to 3 seconds which further increases the task latency)
   
   We have a use case to run 1000s of concurrent tasks and we started using Airflow with K8 executor where we observed and investigated this behaviour.
   So we are proposing a fix to have some control over the number of worker pods that get submitted per loop. Ideal scenarion might haven been to submit all the tasks  per loop but it can impact the scheduler's processing of DAG files as these are synchronous call. 
   So we made this configurable.
   Please let us know if it makes sense and please also share your thoughts/comments

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services