You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/14 11:20:18 UTC

[GitHub] [arrow-datafusion] yjshen commented on issue #1221: Task assignment between Scheduler and Executors

yjshen commented on issue #1221:
URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013033180


   @jon-chuang Thanks for bringing this up. I may mistake something for Ray, please point out.
   
   IMHO, Ray is designed to ease the development of the general purposed distributed program. It's more like "parallel your machine learning code and run on a cluster without pain", just like what you have provided in the code sample above. 
   
   On the other hand, Ballista is meant to be a distributed SQL query engine, the code to distribute and run is quite limited, it's all about DataFusion's limited number of physical operators. So what should I expect from Ray integration? Does Ray provide core abilities like task scheduling, keepalive monitoring, struggler detection, and speculative task execution?  Therefore I could easily build a distributed SQL engine on top of DataFusion with little effort?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org