You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Jiayu Liu <ji...@hey.com.INVALID> on 2021/06/22 07:18:03 UTC

Datafusion's vision and roadmap?

Hi,

This is regarding my question about the datafusion's vision and roadmap.

As a new contributor, I wonder what would be a vision and roadmap that
most of the contributors can/already have be aligned upon.

Maybe due to my lack of prior context I might have missed such
discussion, or maybe this is intentionally left to be open so that
different contributors and companies can have their own features to be
compatible. But I still believe in the value of having one, and it can
somehow be shown in the README.md or contributing guideline, so that
users and the community can see what to expect from and contribute to.

By "vision" I mean something that's necessarily vague and serving as an
overarching goal, e.g. "leveraging rust and arrow and become the most
performant SQL-compatible query engine on a single node", or "fully
compatible with (most of) PostgreSQL syntax and pluggable in most of the
web-scale analytical engines".

I believe having this in place can help pushing the project forwards
esp. in cases of trade off, e.g. sticking to newest rust release v.s.
providing LTS, or incorporating as many features as possible (e.g.
recursive CTE? BSON support? query materializations?) v.s. keeping
binary size small and everything else into a plugin mode.

Re: Datafusion's vision and roadmap?

Posted by Andrew Lamb <al...@influxdata.com>.
This topic came up again, and I have started a PR [1] to see if we can
record / build a larger consensus

Andrew
[1]  https://github.com/apache/arrow-datafusion/pull/1104

On Tue, Jun 22, 2021 at 1:25 PM Andrew Lamb <al...@influxdata.com> wrote:

> Thank you for bringing this topic up.
>
> Expanding on what you suggested, here is another about this for a vision?
>
> DataFusion's vision is to become *the de facto query engine* of choice for
> new analytic applications, by leveraging the unique features of Rust and
> Apache Arrow to provide:
> 1.  best-in-class query performance for a single node
> 2. A feature-complete declarative query interface via  (most of)
> PostgreSQL
> 3. A feature-rich  procedural interface for creating and running execution
> plans
> 4. High performance extensibility at at every layer
>
> The current [2] readme describes *what* Datafusion is, but does not really
> give a vision going forward. A few months ago we tried a "what is everyone
> thinking of working on" type approach [1] to create a roadmap. While that
> was insightful, I agree having a single unified (even if vague) goal would
> be very helpful
>
> I would welcome other thoughts as well: if there appears to be some
> consensus then we can make a PR to add the proposal to the DataFusion readme
>
> @Andy Grove <an...@gmail.com>  do you have any thoughts?
>
> Andrew
>
>
> [1]
> https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit?userstoinvite=jonas.hansen%40airbus.com&ts=604a2a22&actionButton=1
> [2] https://github.com/apache/arrow-datafusion#readme
>
> On Tue, Jun 22, 2021 at 3:18 AM Jiayu Liu <ji...@hey.com.invalid> wrote:
>
>> Hi,
>>
>> This is regarding my question about the datafusion's vision and roadmap.
>>
>> As a new contributor, I wonder what would be a vision and roadmap that
>> most of the contributors can/already have be aligned upon.
>>
>> Maybe due to my lack of prior context I might have missed such
>> discussion, or maybe this is intentionally left to be open so that
>> different contributors and companies can have their own features to be
>> compatible. But I still believe in the value of having one, and it can
>> somehow be shown in the README.md or contributing guideline, so that
>> users and the community can see what to expect from and contribute to.
>>
>> By "vision" I mean something that's necessarily vague and serving as an
>> overarching goal, e.g. "leveraging rust and arrow and become the most
>> performant SQL-compatible query engine on a single node", or "fully
>> compatible with (most of) PostgreSQL syntax and pluggable in most of the
>> web-scale analytical engines".
>>
>> I believe having this in place can help pushing the project forwards
>> esp. in cases of trade off, e.g. sticking to newest rust release v.s.
>> providing LTS, or incorporating as many features as possible (e.g.
>> recursive CTE? BSON support? query materializations?) v.s. keeping
>> binary size small and everything else into a plugin mode.
>>
>

Re: Datafusion's vision and roadmap?

Posted by Andrew Lamb <al...@influxdata.com>.
Thank you for bringing this topic up.

Expanding on what you suggested, here is another about this for a vision?

DataFusion's vision is to become *the de facto query engine* of choice for
new analytic applications, by leveraging the unique features of Rust and
Apache Arrow to provide:
1.  best-in-class query performance for a single node
2. A feature-complete declarative query interface via  (most of)  PostgreSQL
3. A feature-rich  procedural interface for creating and running execution
plans
4. High performance extensibility at at every layer

The current [2] readme describes *what* Datafusion is, but does not really
give a vision going forward. A few months ago we tried a "what is everyone
thinking of working on" type approach [1] to create a roadmap. While that
was insightful, I agree having a single unified (even if vague) goal would
be very helpful

I would welcome other thoughts as well: if there appears to be some
consensus then we can make a PR to add the proposal to the DataFusion readme

@Andy Grove <an...@gmail.com>  do you have any thoughts?

Andrew


[1]
https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit?userstoinvite=jonas.hansen%40airbus.com&ts=604a2a22&actionButton=1
[2] https://github.com/apache/arrow-datafusion#readme

On Tue, Jun 22, 2021 at 3:18 AM Jiayu Liu <ji...@hey.com.invalid> wrote:

> Hi,
>
> This is regarding my question about the datafusion's vision and roadmap.
>
> As a new contributor, I wonder what would be a vision and roadmap that
> most of the contributors can/already have be aligned upon.
>
> Maybe due to my lack of prior context I might have missed such
> discussion, or maybe this is intentionally left to be open so that
> different contributors and companies can have their own features to be
> compatible. But I still believe in the value of having one, and it can
> somehow be shown in the README.md or contributing guideline, so that
> users and the community can see what to expect from and contribute to.
>
> By "vision" I mean something that's necessarily vague and serving as an
> overarching goal, e.g. "leveraging rust and arrow and become the most
> performant SQL-compatible query engine on a single node", or "fully
> compatible with (most of) PostgreSQL syntax and pluggable in most of the
> web-scale analytical engines".
>
> I believe having this in place can help pushing the project forwards
> esp. in cases of trade off, e.g. sticking to newest rust release v.s.
> providing LTS, or incorporating as many features as possible (e.g.
> recursive CTE? BSON support? query materializations?) v.s. keeping
> binary size small and everything else into a plugin mode.
>