You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/21 17:35:35 UTC

[GitHub] [arrow-ballista] andygrove commented on issue #30: [Discuss] Ballista Future Direction

andygrove commented on issue #30:
URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133705734

   Thanks for starting this discussion @thinkharderdev :heart:
   
   With Ballista moving to this new repository I think it is an excellent time to "reboot" the project and assess what we are trying to build here.
   
   I'd like to provide some historical context for how we ended up where we are today:
   
   The original goal with Ballista was essentially "rewrite Apache Spark in Rust" but avoiding an architecture that heavily favors a particular programming language (Scala, in Spark's case). This is why we serialize plans to protobuf format rather than just using Rust's "serde" crate, which would have been much easier. I now hope that we can eventually adopt https://substrait.io/ as the serialization format to make it easier for Ballista to leverage query engines other than DataFusion.
   
   Quite early in the development process, I discovered Apache Arrow and made that a core part of the design as well. This in my mind was another clear advantage over Apache Spark, which is largely row-based.
   
   Obviously, the choice of Rust is another major differentiator with its unique approach to memory management and safety.
   
   I designed Ballista based on my experience of using Apache Spark for SQL/ETL batch jobs.
   
   I would fully support seeing Ballista support both batch and streaming and I think it would be fine for a user to pick one or the other when executing a query and use different APIs for each case. That said, I have not looked into this so there are likely complications that I am not even aware of here. I will start learning more about streaming in Spark and Flink so that I can better contribute to the discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org