You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/09 08:21:07 UTC

[GitHub] [arrow-ballista] thinkharderdev commented on issue #119: ExecutorReservation during task scheduling

thinkharderdev commented on issue #119:
URL: https://github.com/apache/arrow-ballista/issues/119#issuecomment-1209067748

> Since we need to update the available task slots for each executor in atomic, I think compare-and-swap operators can satisfy the needs perfectly.

This is a bit problematic with etcd since it doesn't have an CAS primitve. Instead you have transactions which can be used to build a CAS operation but it also doesn't have an increment/decrement primitive so you would effectively have to do multiple transactions to do a CAS.

> And I think this statement "When tasks finish we want to preferentially assign new tasks from the same job" is not true.
In a system which has multiple SQL jobs running concurrently(multiple SQL jobs compete for the resources), we should support FAIR or FIFO scheduling , If we prefer to assign new tasks for the same job, it could starve other jobs.

I agree that what we really need is a priority assigned at the job level. The current approach is mostly a hack until we have that in place. Their are two reasons for the current approach:

1. Without some sort of priority queue for jobs, we want to avoid a "live lock" situation where the system is handling more queries than it has capacity for and no query is completing because the scheduler is just randomly assigning tasks. Trying to schedule tasks from the same job at least ensures that a job, once started will complete.
2. Ideally we want to try and take into account data locality. If we can schedule tasks on the same executor where their input partition is written, then we can avoid network overhead in reading shuffles. The current approach doesn't do that exactly but it makes it more probable and I think ultimately we should be doing that explicitly.

Ultimately I think the direction this all points in though is to separate concerns a bit in the scheduler tier. The schedulers should elect a leader which is responsible for task scheduling (and can use in-memory data structures as you mention) and the followers can handle the planning. That will add a lot of complexity to the implementation but should remove basically all of the need for locks on the state.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org