You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pa...@yahoo.com.INVALID> on 2018/08/22 00:32:15 UTC

The Ray framework

Hi All,

There is a cool new distributed framework coming out of UC Berkeley: Ray [1]. This is part of the RISE project which is the successor to the AmpLab project that produced Spark. The Ray paper [2] provides a great overview.

(quote)
Ray is a high-performance distributed execution framework targeted at large-scale machine learning and reinforcement learning applications. It achieves scalability and fault tolerance by abstracting the control state of the system in a global control store and keeping all other components stateless. It uses a shared-memory distributed object store to efficiently handle large data through shared memory, and it uses a bottom-up hierarchical scheduling architecture to achieve low-latency and high-throughput scheduling. It uses a lightweight API based on dynamic task graphs and actors to express a wide range of applications in a flexible manner.

... Ray implements a dynamic task graph computation model that supports both the task-parallel and the actor programming models. To meet the performance requirements of AI applications, we propose an architecture that logically centralizes the system's control state using a sharded storage system and a novel bottom-up distributed scheduler. In our experiments, we demonstrate sub-millisecond remote task latencies and linear throughput scaling beyond 1.8 million tasks per second. We empirically validate that Ray speeds up challenging benchmarks...
(unquote)

While Ray is targeted at machine learning (ML), it is not hard to imagine using Ray to run query DAGs for Drill. Good ideas here to mine for Drill (and similar distributed projects.)

Thanks,
- Paul

[1] https://rise.cs.berkeley.edu/projects/ray/
[2] https://rise.cs.berkeley.edu/blog/publication/ray-distributed-framework-emerging-ai-applications/