You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2018/12/07 03:36:57 UTC

Thoughts about 2019 Arrow development focus areas

hi folks,

I jotted down some high level ideas about directions I'd like to push
the various parts of the project on the C++ side along with the
language bindings in Python, R, Ruby, and others. Many people may know
that I am building a not-for-profit open source development team to
focus on Apache Arrow (https://ursalabs.org/), so this document is
partly for my colleagues to organize some lower-level technical
discussions and planning in the Arrow JIRA. I'm interested from
feedback from the whole Arrow community, and we obviously would love
to have as many people as possible involved who have an interest in
the C++ libraries and their bindings.

The simplified summary is that I would like to work toward an
embeddable in-memory query engine in C++ that can be used in all the
bindings. This can be used in numerous contexts, from data frame
libraries to streaming data transformation. As a simple example, we
could compile filter expressions with Gandiva and apply these to a
stream of record batches being materialized from a directory of
Parquet files.

There's a lot of pieces that still have to fall into place to do this
in a sustainable and non-hacky way.

https://docs.google.com/document/d/12dWBniKW2JQ-5djE3SPjyQXVquCAEmLXVlb1dnhLhQ0/edit#heading=h.62rx18p423rw

Looking forward to the feedback of others!

Thanks
Wes

Re: Thoughts about 2019 Arrow development focus areas

Posted by Andy Grove <an...@gmail.com>.

Wes,

This is very exciting. Thanks for writing up the detailed document.

I think it is time for me to start brushing up on modern C++.

Andy.

On Thu, Dec 6, 2018 at 8:37 PM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> I jotted down some high level ideas about directions I'd like to push
> the various parts of the project on the C++ side along with the
> language bindings in Python, R, Ruby, and others. Many people may know
> that I am building a not-for-profit open source development team to
> focus on Apache Arrow (https://ursalabs.org/), so this document is
> partly for my colleagues to organize some lower-level technical
> discussions and planning in the Arrow JIRA. I'm interested from
> feedback from the whole Arrow community, and we obviously would love
> to have as many people as possible involved who have an interest in
> the C++ libraries and their bindings.
>
> The simplified summary is that I would like to work toward an
> embeddable in-memory query engine in C++ that can be used in all the
> bindings. This can be used in numerous contexts, from data frame
> libraries to streaming data transformation. As a simple example, we
> could compile filter expressions with Gandiva and apply these to a
> stream of record batches being materialized from a directory of
> Parquet files.
>
> There's a lot of pieces that still have to fall into place to do this
> in a sustainable and non-hacky way.
>
>
> https://docs.google.com/document/d/12dWBniKW2JQ-5djE3SPjyQXVquCAEmLXVlb1dnhLhQ0/edit#heading=h.62rx18p423rw
>
> Looking forward to the feedback of others!
>
> Thanks
> Wes
>