You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andrew Lamb <al...@influxdata.com> on 2021/10/11 11:26:24 UTC

Re: Datafusion's vision and roadmap?

This topic came up again, and I have started a PR [1] to see if we can
record / build a larger consensus

Andrew
[1]  https://github.com/apache/arrow-datafusion/pull/1104

On Tue, Jun 22, 2021 at 1:25 PM Andrew Lamb <al...@influxdata.com> wrote:

> Thank you for bringing this topic up.
>
> Expanding on what you suggested, here is another about this for a vision?
>
> DataFusion's vision is to become *the de facto query engine* of choice for
> new analytic applications, by leveraging the unique features of Rust and
> Apache Arrow to provide:
> 1.  best-in-class query performance for a single node
> 2. A feature-complete declarative query interface via  (most of)
> PostgreSQL
> 3. A feature-rich  procedural interface for creating and running execution
> plans
> 4. High performance extensibility at at every layer
>
> The current [2] readme describes *what* Datafusion is, but does not really
> give a vision going forward. A few months ago we tried a "what is everyone
> thinking of working on" type approach [1] to create a roadmap. While that
> was insightful, I agree having a single unified (even if vague) goal would
> be very helpful
>
> I would welcome other thoughts as well: if there appears to be some
> consensus then we can make a PR to add the proposal to the DataFusion readme
>
> @Andy Grove <an...@gmail.com>  do you have any thoughts?
>
> Andrew
>
>
> [1]
> https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit?userstoinvite=jonas.hansen%40airbus.com&ts=604a2a22&actionButton=1
> [2] https://github.com/apache/arrow-datafusion#readme
>
> On Tue, Jun 22, 2021 at 3:18 AM Jiayu Liu <ji...@hey.com.invalid> wrote:
>
>> Hi,
>>
>> This is regarding my question about the datafusion's vision and roadmap.
>>
>> As a new contributor, I wonder what would be a vision and roadmap that
>> most of the contributors can/already have be aligned upon.
>>
>> Maybe due to my lack of prior context I might have missed such
>> discussion, or maybe this is intentionally left to be open so that
>> different contributors and companies can have their own features to be
>> compatible. But I still believe in the value of having one, and it can
>> somehow be shown in the README.md or contributing guideline, so that
>> users and the community can see what to expect from and contribute to.
>>
>> By "vision" I mean something that's necessarily vague and serving as an
>> overarching goal, e.g. "leveraging rust and arrow and become the most
>> performant SQL-compatible query engine on a single node", or "fully
>> compatible with (most of) PostgreSQL syntax and pluggable in most of the
>> web-scale analytical engines".
>>
>> I believe having this in place can help pushing the project forwards
>> esp. in cases of trade off, e.g. sticking to newest rust release v.s.
>> providing LTS, or incorporating as many features as possible (e.g.
>> recursive CTE? BSON support? query materializations?) v.s. keeping
>> binary size small and everything else into a plugin mode.
>>
>