You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/19 07:43:13 UTC

[GitHub] [arrow-datafusion] houqp commented on pull request #1560: Introduce push-based task scheduling for Ballista

houqp commented on pull request #1560:
URL: https://github.com/apache/arrow-datafusion/pull/1560#issuecomment-1016165577


   @realno to your question on poll v.s. push, I think you are spot on with regards to poll being simpler on design. @edrevo  and @j4ckcyw had some good prior discussions on this topic as well at https://github.com/ballista-compute/ballista/issues/463.
   
   My current thinking on this is the scheduler state complexity introduced through the push model might be needed in the long run for optimal task scheduling when we can take more resource related factors into account instead of just available task slots. Having a global view of the system generally yields better task placements.
   
   On the scaling front, even though push based model incurs more state management overhead on the scheduler side, it has its own scaling strength as well. For example, active message exchanges between scheduler and executors scale much better with larger executor pool size because heart beat messages are much easier to process compared to task poll requests. It is also easier for the scheduler to reduce its own load by scheduling proactively schedule less tasks v.s. rate limiting executor poll requests in a polling model.
   
   So I think there is no clear cut here. Perhaps another angle would be a hybrid model where we still use the base model as the foundation , but try to move as much state management logic into the executors to reduce the overhead on the scheduler side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org