You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/14 21:30:20 UTC

[GitHub] [arrow-datafusion] realno commented on pull request #1560: Introduce push-based task scheduling

realno commented on pull request #1560:
URL: https://github.com/apache/arrow-datafusion/pull/1560#issuecomment-1013486440


   @yahoNanJing it is a very well written document, great work!
   
   I am wondering if there are any options based on the original poll model you have investigated and what your findings are. I think there are many benefits for using the poll/pull model:
   
   - The scheduler and executor are better decoupled. The scheduler does not need to have any knowledge of the executors, its job is to construct and optimize the plan. On the other hand the executors just need to know where to get the tasks, this can be future abstracted by using some queuing or messaging system. It is a fairly clean design and can scale pretty well. 
   - There are minimal states maintained within the system, that will help stability and resilience of the system
   - The complexity of the system is low comparing to the push model
   
   Regarding the original issue, I see a good reason to try reducing CPU usage. In terms of query time, is it that critical for DataFusion use cases? IMO we would optimize for large distrbuted jobs, perhaps we can live with a few millisecond delay here and there.
   
   Again, thanks for the proposal I am curious about what you and other contributors think. 
   
   BTW, I am recently thinking about having Ballista production ready and work well with modern cloud native architecture, I think you are into the same topic. I am happy to have discussion about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org