You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "potiuk (via GitHub)" <gi...@apache.org> on 2023/02/11 17:39:34 UTC

[GitHub] [airflow] potiuk commented on issue #29474: Pool slots > 1 handling with priority weight

potiuk commented on issue #29474:
URL: https://github.com/apache/airflow/issues/29474#issuecomment-1426834638

   BTW. In practice, what you would like to achieve is extremely hard @jossM. If you look at various prioritisation mechanisms in computer science (starting from priorities of processes in Kernel, and ending with prioritisation of prioritisation of Pods in Kubernetes) there are no "perfect" solutions that work following "I want the high priority tasks to always be first, not preempt the low priority ones and by the way there should be 0 resource overhead for that". This does not work in practice. And there is a very sound math theory behind it.
   
   You have to always give up something:
   * priority
   * resources
   * preempting tasks
   
   It is impossible to have "perfect priority with low latency no resource overhead without preempting other tasks". Basic math says so.
   
   The setup you have (with fixed pool size) is a choice of giving up priority in exchange for no extra resource overhead, and no preemption. 
   
   You could make another choice - for example preempting low-priority tasks (and freeing slots) when the high priority task is about to start. Mainly because airflow never pre-empts running tasks on purpose. If that would be acceptable for you, you could likely add a monitor in your low-priority tasks that would exit immediately if new high priority tasks is in queued state. 
   
   Or you could sacrifice resources - and for example rather than using pools to limit the parallel numer of tasks would use separate queeues to run high and small priority tasks. This way when therea are no high priority tasks running the queue would go idle and you would loose some resources, but then the queue would be ready to pick up high priority tasks immediately when they are available.
   
   If - for example you could design (and likely implement) a proposal on how to implement preempting low priority tasks - that would be an interesting feature to consider. But preempting running tasks is not a good idea in general.
   
   But I think it's not that you have no other way. You have not explained why you are really using pools - but maybe if you decide what you can sacrifice a bit, with the right combination of queues and pools you can likely achieve what you want. But you have to decide which of the three is priority for you - same like with short time, high qualty, low cost managers often think they can all of those at once, but in practice they need to decide which of those three they want to give up on.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org