You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andy Grove <an...@gmail.com> on 2019/01/06 21:52:33 UTC

[Rust] [DISCUSS] Donate DataFusion to Arrow project

I'm starting a new thread for this discussion (this was previously
discussed in the Rust Roadmap thread).

The reason I got involved with Arrow is that I have been working on
DataFusion[1] which is currently an in-process SQL query engine on top of
Arrow. It allows queries to be executed against the Arrow CSV reader (and
will shortly support the Arrow Parquet reader too) and presents results as
a sequence of RecordBatch instances.

I would like to donate this code to the Arrow project so that Arrow has a
Rust-native query execution engine built in and to accelerate development
of this capability.

I have a fairly detailed roadmap[2] in mind for the project and it could
eventually become a standalone project potentially (under ASF still).

I don't know what the process is to vote on this, so wanted to discuss that
in this thread first.

References:

[1] DataFusion: https://github.com/andygrove/datafusion
[2] Roadmap: https://github.com/andygrove/datafusion/blob/master/ROADMAP.md

Thanks,

Andy.

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Hi Neville,

Thanks for the support.

DataFrame and SQL are two different ways of building a logical query plan
and it makes sense that they should both build the same type of plan
without code duplication. It is also sometimes beneficial to mix and match
DataFrame and SQL operations (as per Apache Spark). I agree that this work
will help drive requirements for primitive operations which can be pushed
down into the core code.

Thanks,

Andy.

On Tue, Jan 8, 2019 at 8:07 AM Neville Dipale <ne...@gmail.com> wrote:

> Hi Andy,
>
> I can't comment on the voting process, but regarding the addition of
> DataFusion:
>
> I support the idea to donate the code, mainly as I think that will help us
> accelerate some work on Rust. Out of curiousity, I've been prototying a
> 'Rust dataframe' abstraction which (can/will) have various scalar,
> aggregation, array and window functions.
>
> I'm doing this trying to put on the hat of someone wanting to use Rust in
> their binary or library. I'm already finding some things that might be
> *core* but are still not yet implemented. The presence of array_ops is also
> helpful because in addition to an efficient in-memory rep of data, they
> enable one to do some basic data manipulation on such data.
>
> Having DataFusion added to Arrow could help fill some gaps in our codebase;
> and I'm willing to work there.
>
> Regards
> Neville
>
> On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com> wrote:
>
> > Bumping this thread ... I know everyone is busy with getting the 0.12
> > release out, but would be good to know the process for raising this for a
> > vote. However, given the lack of comments on this thread I'm starting to
> > suspect that maybe there isn't much of an appetite for this, which is
> fine,
> > but would be good to find out for sure.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com> wrote:
> >
> > > Thanks, Ted!
> > >
> > > I wish I'd been a bit more specific about my ask in the original
> email...
> > > I guess my question (for Wes?) is what is the process to raise this
> for a
> > > vote?
> > >
> > > Andy.
> > >
> > >
> > >
> > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > >> Cool!
> > >>
> > >>
> > >>
> > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com>
> > wrote:
> > >>
> > >> > I'm starting a new thread for this discussion (this was previously
> > >> > discussed in the Rust Roadmap thread).
> > >> >
> > >> > The reason I got involved with Arrow is that I have been working on
> > >> > DataFusion[1] which is currently an in-process SQL query engine on
> top
> > >> of
> > >> > Arrow. It allows queries to be executed against the Arrow CSV reader
> > >> (and
> > >> > will shortly support the Arrow Parquet reader too) and presents
> > results
> > >> as
> > >> > a sequence of RecordBatch instances.
> > >> >
> > >> > I would like to donate this code to the Arrow project so that Arrow
> > has
> > >> a
> > >> > Rust-native query execution engine built in and to accelerate
> > >> development
> > >> > of this capability.
> > >> >
> > >> > I have a fairly detailed roadmap[2] in mind for the project and it
> > could
> > >> > eventually become a standalone project potentially (under ASF
> still).
> > >> >
> > >> > I don't know what the process is to vote on this, so wanted to
> discuss
> > >> that
> > >> > in this thread first.
> > >> >
> > >> > References:
> > >> >
> > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > >> > [2] Roadmap:
> > >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Andy.
> > >> >
> > >>
> > >
> >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Wes McKinney <we...@gmail.com>.
hi Andy -- yes. I'll send out the vote thread shortly

On Tue, Jan 22, 2019 at 8:27 AM Andy Grove <an...@gmail.com> wrote:
>
> Wes,
>
> With the 0.12 release out, could we now start the vote for the DataFusion
> donation?
>
> Thanks,
>
> Andy.
>
> On Tue, Jan 15, 2019 at 8:16 AM Andy Grove <an...@gmail.com> wrote:
>
> > Wes,
> >
> > I went ahead and created a JIRA (
> > https://issues.apache.org/jira/browse/ARROW-4263) and PR (
> > https://github.com/apache/arrow/pull/3399) for the donation so it is
> > ready to go if the vote passes.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 14, 2019 at 1:57 PM Andy Grove <an...@gmail.com> wrote:
> >
> >> Wes,
> >>
> >> Thanks. Yes, I'd like to proceed with the vote as soon as you are ready.
> >>
> >> I don't think I need much time at all at this point to prepare the merge.
> >> I already have a branch of DataFusion that is building against the latest
> >> Arrow code, so it's really just a case of updating source files with the
> >> correct license headers and updating the README. I will start on this
> >> tonight.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >>
> >> On Mon, Jan 14, 2019 at 1:16 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >>> Getting the 0.12 release out is my priority right now, but it seems
> >>> that there are no major objections to this code donation.
> >>>
> >>> @Andy -- I can kick off the vote to accept the code donation in the
> >>> next few days if you'd like to proceed with that. How much time do you
> >>> think it would take for you to ready the merge?
> >>>
> >>> Thanks,
> >>> Wes
> >>>
> >>> On Wed, Jan 9, 2019 at 8:28 AM Andy Grove <an...@gmail.com> wrote:
> >>> >
> >>> > Wes,
> >>> >
> >>> > Thanks. This sounds great.
> >>> >
> >>> > Andy.
> >>> >
> >>> > On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > hi Andy -- I'm supportive of the code donation. I see building
> >>> > > in-memory, embeddable analytics and query processing as the natural
> >>> > > next stage of this project. As I have described on this mailing list,
> >>> > > I intend to work on this with my colleagues in C++ with the goal of
> >>> > > making such functionality available at least in C, Python, R, and
> >>> > > Ruby. I see no reason why such work should be exclusive to C++.
> >>> > >
> >>> > > Rust seems like a reasonable implementation language for this, and
> >>> > > given growing interest in the language, I think it will help grow the
> >>> > > Arrow community.
> >>> > >
> >>> > > I'd like to wait a few more days to allow others to weigh in, but we
> >>> > > could conduct a vote about accepting the code donation as early as
> >>> > > next week. We would need to go through the ASF IP Clearance process
> >>> > > after that. So the entire procedural process would take about 6 days,
> >>> > > assuming that there are no licensing issues and the code will be
> >>> ready
> >>> > > to merge into the Arrow codebase.
> >>> > >
> >>> > > Thanks
> >>> > > Wes
> >>> > >
> >>> > > On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <nevilledips@gmail.com
> >>> >
> >>> > > wrote:
> >>> > > >
> >>> > > > Hi Andy,
> >>> > > >
> >>> > > > I can't comment on the voting process, but regarding the addition
> >>> of
> >>> > > > DataFusion:
> >>> > > >
> >>> > > > I support the idea to donate the code, mainly as I think that will
> >>> help
> >>> > > us
> >>> > > > accelerate some work on Rust. Out of curiousity, I've been
> >>> prototying a
> >>> > > > 'Rust dataframe' abstraction which (can/will) have various scalar,
> >>> > > > aggregation, array and window functions.
> >>> > > >
> >>> > > > I'm doing this trying to put on the hat of someone wanting to use
> >>> Rust in
> >>> > > > their binary or library. I'm already finding some things that
> >>> might be
> >>> > > > *core* but are still not yet implemented. The presence of
> >>> array_ops is
> >>> > > also
> >>> > > > helpful because in addition to an efficient in-memory rep of data,
> >>> they
> >>> > > > enable one to do some basic data manipulation on such data.
> >>> > > >
> >>> > > > Having DataFusion added to Arrow could help fill some gaps in our
> >>> > > codebase;
> >>> > > > and I'm willing to work there.
> >>> > > >
> >>> > > > Regards
> >>> > > > Neville
> >>> > > >
> >>> > > > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com>
> >>> wrote:
> >>> > > >
> >>> > > > > Bumping this thread ... I know everyone is busy with getting the
> >>> 0.12
> >>> > > > > release out, but would be good to know the process for raising
> >>> this
> >>> > > for a
> >>> > > > > vote. However, given the lack of comments on this thread I'm
> >>> starting
> >>> > > to
> >>> > > > > suspect that maybe there isn't much of an appetite for this,
> >>> which is
> >>> > > fine,
> >>> > > > > but would be good to find out for sure.
> >>> > > > >
> >>> > > > > Thanks,
> >>> > > > >
> >>> > > > > Andy.
> >>> > > > >
> >>> > > > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <andygrove73@gmail.com
> >>> >
> >>> > > wrote:
> >>> > > > >
> >>> > > > > > Thanks, Ted!
> >>> > > > > >
> >>> > > > > > I wish I'd been a bit more specific about my ask in the
> >>> original
> >>> > > email...
> >>> > > > > > I guess my question (for Wes?) is what is the process to raise
> >>> this
> >>> > > for a
> >>> > > > > > vote?
> >>> > > > > >
> >>> > > > > > Andy.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <
> >>> ted.dunning@gmail.com>
> >>> > > > > wrote:
> >>> > > > > >
> >>> > > > > >> Cool!
> >>> > > > > >>
> >>> > > > > >>
> >>> > > > > >>
> >>> > > > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <
> >>> andygrove73@gmail.com>
> >>> > > > > wrote:
> >>> > > > > >>
> >>> > > > > >> > I'm starting a new thread for this discussion (this was
> >>> previously
> >>> > > > > >> > discussed in the Rust Roadmap thread).
> >>> > > > > >> >
> >>> > > > > >> > The reason I got involved with Arrow is that I have been
> >>> working
> >>> > > on
> >>> > > > > >> > DataFusion[1] which is currently an in-process SQL query
> >>> engine
> >>> > > on top
> >>> > > > > >> of
> >>> > > > > >> > Arrow. It allows queries to be executed against the Arrow
> >>> CSV
> >>> > > reader
> >>> > > > > >> (and
> >>> > > > > >> > will shortly support the Arrow Parquet reader too) and
> >>> presents
> >>> > > > > results
> >>> > > > > >> as
> >>> > > > > >> > a sequence of RecordBatch instances.
> >>> > > > > >> >
> >>> > > > > >> > I would like to donate this code to the Arrow project so
> >>> that
> >>> > > Arrow
> >>> > > > > has
> >>> > > > > >> a
> >>> > > > > >> > Rust-native query execution engine built in and to
> >>> accelerate
> >>> > > > > >> development
> >>> > > > > >> > of this capability.
> >>> > > > > >> >
> >>> > > > > >> > I have a fairly detailed roadmap[2] in mind for the project
> >>> and it
> >>> > > > > could
> >>> > > > > >> > eventually become a standalone project potentially (under
> >>> ASF
> >>> > > still).
> >>> > > > > >> >
> >>> > > > > >> > I don't know what the process is to vote on this, so wanted
> >>> to
> >>> > > discuss
> >>> > > > > >> that
> >>> > > > > >> > in this thread first.
> >>> > > > > >> >
> >>> > > > > >> > References:
> >>> > > > > >> >
> >>> > > > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> >>> > > > > >> > [2] Roadmap:
> >>> > > > > >> >
> >>> https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> >>> > > > > >> >
> >>> > > > > >> > Thanks,
> >>> > > > > >> >
> >>> > > > > >> > Andy.
> >>> > > > > >> >
> >>> > > > > >>
> >>> > > > > >
> >>> > > > >
> >>> > >
> >>>
> >>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Wes,

With the 0.12 release out, could we now start the vote for the DataFusion
donation?

Thanks,

Andy.

On Tue, Jan 15, 2019 at 8:16 AM Andy Grove <an...@gmail.com> wrote:

> Wes,
>
> I went ahead and created a JIRA (
> https://issues.apache.org/jira/browse/ARROW-4263) and PR (
> https://github.com/apache/arrow/pull/3399) for the donation so it is
> ready to go if the vote passes.
>
> Thanks,
>
> Andy.
>
> On Mon, Jan 14, 2019 at 1:57 PM Andy Grove <an...@gmail.com> wrote:
>
>> Wes,
>>
>> Thanks. Yes, I'd like to proceed with the vote as soon as you are ready.
>>
>> I don't think I need much time at all at this point to prepare the merge.
>> I already have a branch of DataFusion that is building against the latest
>> Arrow code, so it's really just a case of updating source files with the
>> correct license headers and updating the README. I will start on this
>> tonight.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>> On Mon, Jan 14, 2019 at 1:16 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> Getting the 0.12 release out is my priority right now, but it seems
>>> that there are no major objections to this code donation.
>>>
>>> @Andy -- I can kick off the vote to accept the code donation in the
>>> next few days if you'd like to proceed with that. How much time do you
>>> think it would take for you to ready the merge?
>>>
>>> Thanks,
>>> Wes
>>>
>>> On Wed, Jan 9, 2019 at 8:28 AM Andy Grove <an...@gmail.com> wrote:
>>> >
>>> > Wes,
>>> >
>>> > Thanks. This sounds great.
>>> >
>>> > Andy.
>>> >
>>> > On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> >
>>> > > hi Andy -- I'm supportive of the code donation. I see building
>>> > > in-memory, embeddable analytics and query processing as the natural
>>> > > next stage of this project. As I have described on this mailing list,
>>> > > I intend to work on this with my colleagues in C++ with the goal of
>>> > > making such functionality available at least in C, Python, R, and
>>> > > Ruby. I see no reason why such work should be exclusive to C++.
>>> > >
>>> > > Rust seems like a reasonable implementation language for this, and
>>> > > given growing interest in the language, I think it will help grow the
>>> > > Arrow community.
>>> > >
>>> > > I'd like to wait a few more days to allow others to weigh in, but we
>>> > > could conduct a vote about accepting the code donation as early as
>>> > > next week. We would need to go through the ASF IP Clearance process
>>> > > after that. So the entire procedural process would take about 6 days,
>>> > > assuming that there are no licensing issues and the code will be
>>> ready
>>> > > to merge into the Arrow codebase.
>>> > >
>>> > > Thanks
>>> > > Wes
>>> > >
>>> > > On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <nevilledips@gmail.com
>>> >
>>> > > wrote:
>>> > > >
>>> > > > Hi Andy,
>>> > > >
>>> > > > I can't comment on the voting process, but regarding the addition
>>> of
>>> > > > DataFusion:
>>> > > >
>>> > > > I support the idea to donate the code, mainly as I think that will
>>> help
>>> > > us
>>> > > > accelerate some work on Rust. Out of curiousity, I've been
>>> prototying a
>>> > > > 'Rust dataframe' abstraction which (can/will) have various scalar,
>>> > > > aggregation, array and window functions.
>>> > > >
>>> > > > I'm doing this trying to put on the hat of someone wanting to use
>>> Rust in
>>> > > > their binary or library. I'm already finding some things that
>>> might be
>>> > > > *core* but are still not yet implemented. The presence of
>>> array_ops is
>>> > > also
>>> > > > helpful because in addition to an efficient in-memory rep of data,
>>> they
>>> > > > enable one to do some basic data manipulation on such data.
>>> > > >
>>> > > > Having DataFusion added to Arrow could help fill some gaps in our
>>> > > codebase;
>>> > > > and I'm willing to work there.
>>> > > >
>>> > > > Regards
>>> > > > Neville
>>> > > >
>>> > > > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > > Bumping this thread ... I know everyone is busy with getting the
>>> 0.12
>>> > > > > release out, but would be good to know the process for raising
>>> this
>>> > > for a
>>> > > > > vote. However, given the lack of comments on this thread I'm
>>> starting
>>> > > to
>>> > > > > suspect that maybe there isn't much of an appetite for this,
>>> which is
>>> > > fine,
>>> > > > > but would be good to find out for sure.
>>> > > > >
>>> > > > > Thanks,
>>> > > > >
>>> > > > > Andy.
>>> > > > >
>>> > > > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <andygrove73@gmail.com
>>> >
>>> > > wrote:
>>> > > > >
>>> > > > > > Thanks, Ted!
>>> > > > > >
>>> > > > > > I wish I'd been a bit more specific about my ask in the
>>> original
>>> > > email...
>>> > > > > > I guess my question (for Wes?) is what is the process to raise
>>> this
>>> > > for a
>>> > > > > > vote?
>>> > > > > >
>>> > > > > > Andy.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <
>>> ted.dunning@gmail.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > >> Cool!
>>> > > > > >>
>>> > > > > >>
>>> > > > > >>
>>> > > > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <
>>> andygrove73@gmail.com>
>>> > > > > wrote:
>>> > > > > >>
>>> > > > > >> > I'm starting a new thread for this discussion (this was
>>> previously
>>> > > > > >> > discussed in the Rust Roadmap thread).
>>> > > > > >> >
>>> > > > > >> > The reason I got involved with Arrow is that I have been
>>> working
>>> > > on
>>> > > > > >> > DataFusion[1] which is currently an in-process SQL query
>>> engine
>>> > > on top
>>> > > > > >> of
>>> > > > > >> > Arrow. It allows queries to be executed against the Arrow
>>> CSV
>>> > > reader
>>> > > > > >> (and
>>> > > > > >> > will shortly support the Arrow Parquet reader too) and
>>> presents
>>> > > > > results
>>> > > > > >> as
>>> > > > > >> > a sequence of RecordBatch instances.
>>> > > > > >> >
>>> > > > > >> > I would like to donate this code to the Arrow project so
>>> that
>>> > > Arrow
>>> > > > > has
>>> > > > > >> a
>>> > > > > >> > Rust-native query execution engine built in and to
>>> accelerate
>>> > > > > >> development
>>> > > > > >> > of this capability.
>>> > > > > >> >
>>> > > > > >> > I have a fairly detailed roadmap[2] in mind for the project
>>> and it
>>> > > > > could
>>> > > > > >> > eventually become a standalone project potentially (under
>>> ASF
>>> > > still).
>>> > > > > >> >
>>> > > > > >> > I don't know what the process is to vote on this, so wanted
>>> to
>>> > > discuss
>>> > > > > >> that
>>> > > > > >> > in this thread first.
>>> > > > > >> >
>>> > > > > >> > References:
>>> > > > > >> >
>>> > > > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
>>> > > > > >> > [2] Roadmap:
>>> > > > > >> >
>>> https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
>>> > > > > >> >
>>> > > > > >> > Thanks,
>>> > > > > >> >
>>> > > > > >> > Andy.
>>> > > > > >> >
>>> > > > > >>
>>> > > > > >
>>> > > > >
>>> > >
>>>
>>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Wes,

I went ahead and created a JIRA (
https://issues.apache.org/jira/browse/ARROW-4263) and PR (
https://github.com/apache/arrow/pull/3399) for the donation so it is ready
to go if the vote passes.

Thanks,

Andy.

On Mon, Jan 14, 2019 at 1:57 PM Andy Grove <an...@gmail.com> wrote:

> Wes,
>
> Thanks. Yes, I'd like to proceed with the vote as soon as you are ready.
>
> I don't think I need much time at all at this point to prepare the merge.
> I already have a branch of DataFusion that is building against the latest
> Arrow code, so it's really just a case of updating source files with the
> correct license headers and updating the README. I will start on this
> tonight.
>
> Thanks,
>
> Andy.
>
>
>
> On Mon, Jan 14, 2019 at 1:16 PM Wes McKinney <we...@gmail.com> wrote:
>
>> Getting the 0.12 release out is my priority right now, but it seems
>> that there are no major objections to this code donation.
>>
>> @Andy -- I can kick off the vote to accept the code donation in the
>> next few days if you'd like to proceed with that. How much time do you
>> think it would take for you to ready the merge?
>>
>> Thanks,
>> Wes
>>
>> On Wed, Jan 9, 2019 at 8:28 AM Andy Grove <an...@gmail.com> wrote:
>> >
>> > Wes,
>> >
>> > Thanks. This sounds great.
>> >
>> > Andy.
>> >
>> > On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > > hi Andy -- I'm supportive of the code donation. I see building
>> > > in-memory, embeddable analytics and query processing as the natural
>> > > next stage of this project. As I have described on this mailing list,
>> > > I intend to work on this with my colleagues in C++ with the goal of
>> > > making such functionality available at least in C, Python, R, and
>> > > Ruby. I see no reason why such work should be exclusive to C++.
>> > >
>> > > Rust seems like a reasonable implementation language for this, and
>> > > given growing interest in the language, I think it will help grow the
>> > > Arrow community.
>> > >
>> > > I'd like to wait a few more days to allow others to weigh in, but we
>> > > could conduct a vote about accepting the code donation as early as
>> > > next week. We would need to go through the ASF IP Clearance process
>> > > after that. So the entire procedural process would take about 6 days,
>> > > assuming that there are no licensing issues and the code will be ready
>> > > to merge into the Arrow codebase.
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> > > On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <ne...@gmail.com>
>> > > wrote:
>> > > >
>> > > > Hi Andy,
>> > > >
>> > > > I can't comment on the voting process, but regarding the addition of
>> > > > DataFusion:
>> > > >
>> > > > I support the idea to donate the code, mainly as I think that will
>> help
>> > > us
>> > > > accelerate some work on Rust. Out of curiousity, I've been
>> prototying a
>> > > > 'Rust dataframe' abstraction which (can/will) have various scalar,
>> > > > aggregation, array and window functions.
>> > > >
>> > > > I'm doing this trying to put on the hat of someone wanting to use
>> Rust in
>> > > > their binary or library. I'm already finding some things that might
>> be
>> > > > *core* but are still not yet implemented. The presence of array_ops
>> is
>> > > also
>> > > > helpful because in addition to an efficient in-memory rep of data,
>> they
>> > > > enable one to do some basic data manipulation on such data.
>> > > >
>> > > > Having DataFusion added to Arrow could help fill some gaps in our
>> > > codebase;
>> > > > and I'm willing to work there.
>> > > >
>> > > > Regards
>> > > > Neville
>> > > >
>> > > > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com>
>> wrote:
>> > > >
>> > > > > Bumping this thread ... I know everyone is busy with getting the
>> 0.12
>> > > > > release out, but would be good to know the process for raising
>> this
>> > > for a
>> > > > > vote. However, given the lack of comments on this thread I'm
>> starting
>> > > to
>> > > > > suspect that maybe there isn't much of an appetite for this,
>> which is
>> > > fine,
>> > > > > but would be good to find out for sure.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Andy.
>> > > > >
>> > > > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Thanks, Ted!
>> > > > > >
>> > > > > > I wish I'd been a bit more specific about my ask in the original
>> > > email...
>> > > > > > I guess my question (for Wes?) is what is the process to raise
>> this
>> > > for a
>> > > > > > vote?
>> > > > > >
>> > > > > > Andy.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <
>> ted.dunning@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > >> Cool!
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <
>> andygrove73@gmail.com>
>> > > > > wrote:
>> > > > > >>
>> > > > > >> > I'm starting a new thread for this discussion (this was
>> previously
>> > > > > >> > discussed in the Rust Roadmap thread).
>> > > > > >> >
>> > > > > >> > The reason I got involved with Arrow is that I have been
>> working
>> > > on
>> > > > > >> > DataFusion[1] which is currently an in-process SQL query
>> engine
>> > > on top
>> > > > > >> of
>> > > > > >> > Arrow. It allows queries to be executed against the Arrow CSV
>> > > reader
>> > > > > >> (and
>> > > > > >> > will shortly support the Arrow Parquet reader too) and
>> presents
>> > > > > results
>> > > > > >> as
>> > > > > >> > a sequence of RecordBatch instances.
>> > > > > >> >
>> > > > > >> > I would like to donate this code to the Arrow project so that
>> > > Arrow
>> > > > > has
>> > > > > >> a
>> > > > > >> > Rust-native query execution engine built in and to accelerate
>> > > > > >> development
>> > > > > >> > of this capability.
>> > > > > >> >
>> > > > > >> > I have a fairly detailed roadmap[2] in mind for the project
>> and it
>> > > > > could
>> > > > > >> > eventually become a standalone project potentially (under ASF
>> > > still).
>> > > > > >> >
>> > > > > >> > I don't know what the process is to vote on this, so wanted
>> to
>> > > discuss
>> > > > > >> that
>> > > > > >> > in this thread first.
>> > > > > >> >
>> > > > > >> > References:
>> > > > > >> >
>> > > > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
>> > > > > >> > [2] Roadmap:
>> > > > > >> >
>> https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
>> > > > > >> >
>> > > > > >> > Thanks,
>> > > > > >> >
>> > > > > >> > Andy.
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > >
>> > >
>>
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Wes,

Thanks. Yes, I'd like to proceed with the vote as soon as you are ready.

I don't think I need much time at all at this point to prepare the merge. I
already have a branch of DataFusion that is building against the latest
Arrow code, so it's really just a case of updating source files with the
correct license headers and updating the README. I will start on this
tonight.

Thanks,

Andy.



On Mon, Jan 14, 2019 at 1:16 PM Wes McKinney <we...@gmail.com> wrote:

> Getting the 0.12 release out is my priority right now, but it seems
> that there are no major objections to this code donation.
>
> @Andy -- I can kick off the vote to accept the code donation in the
> next few days if you'd like to proceed with that. How much time do you
> think it would take for you to ready the merge?
>
> Thanks,
> Wes
>
> On Wed, Jan 9, 2019 at 8:28 AM Andy Grove <an...@gmail.com> wrote:
> >
> > Wes,
> >
> > Thanks. This sounds great.
> >
> > Andy.
> >
> > On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > hi Andy -- I'm supportive of the code donation. I see building
> > > in-memory, embeddable analytics and query processing as the natural
> > > next stage of this project. As I have described on this mailing list,
> > > I intend to work on this with my colleagues in C++ with the goal of
> > > making such functionality available at least in C, Python, R, and
> > > Ruby. I see no reason why such work should be exclusive to C++.
> > >
> > > Rust seems like a reasonable implementation language for this, and
> > > given growing interest in the language, I think it will help grow the
> > > Arrow community.
> > >
> > > I'd like to wait a few more days to allow others to weigh in, but we
> > > could conduct a vote about accepting the code donation as early as
> > > next week. We would need to go through the ASF IP Clearance process
> > > after that. So the entire procedural process would take about 6 days,
> > > assuming that there are no licensing issues and the code will be ready
> > > to merge into the Arrow codebase.
> > >
> > > Thanks
> > > Wes
> > >
> > > On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <ne...@gmail.com>
> > > wrote:
> > > >
> > > > Hi Andy,
> > > >
> > > > I can't comment on the voting process, but regarding the addition of
> > > > DataFusion:
> > > >
> > > > I support the idea to donate the code, mainly as I think that will
> help
> > > us
> > > > accelerate some work on Rust. Out of curiousity, I've been
> prototying a
> > > > 'Rust dataframe' abstraction which (can/will) have various scalar,
> > > > aggregation, array and window functions.
> > > >
> > > > I'm doing this trying to put on the hat of someone wanting to use
> Rust in
> > > > their binary or library. I'm already finding some things that might
> be
> > > > *core* but are still not yet implemented. The presence of array_ops
> is
> > > also
> > > > helpful because in addition to an efficient in-memory rep of data,
> they
> > > > enable one to do some basic data manipulation on such data.
> > > >
> > > > Having DataFusion added to Arrow could help fill some gaps in our
> > > codebase;
> > > > and I'm willing to work there.
> > > >
> > > > Regards
> > > > Neville
> > > >
> > > > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com>
> wrote:
> > > >
> > > > > Bumping this thread ... I know everyone is busy with getting the
> 0.12
> > > > > release out, but would be good to know the process for raising this
> > > for a
> > > > > vote. However, given the lack of comments on this thread I'm
> starting
> > > to
> > > > > suspect that maybe there isn't much of an appetite for this, which
> is
> > > fine,
> > > > > but would be good to find out for sure.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andy.
> > > > >
> > > > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks, Ted!
> > > > > >
> > > > > > I wish I'd been a bit more specific about my ask in the original
> > > email...
> > > > > > I guess my question (for Wes?) is what is the process to raise
> this
> > > for a
> > > > > > vote?
> > > > > >
> > > > > > Andy.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Cool!
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <
> andygrove73@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >> > I'm starting a new thread for this discussion (this was
> previously
> > > > > >> > discussed in the Rust Roadmap thread).
> > > > > >> >
> > > > > >> > The reason I got involved with Arrow is that I have been
> working
> > > on
> > > > > >> > DataFusion[1] which is currently an in-process SQL query
> engine
> > > on top
> > > > > >> of
> > > > > >> > Arrow. It allows queries to be executed against the Arrow CSV
> > > reader
> > > > > >> (and
> > > > > >> > will shortly support the Arrow Parquet reader too) and
> presents
> > > > > results
> > > > > >> as
> > > > > >> > a sequence of RecordBatch instances.
> > > > > >> >
> > > > > >> > I would like to donate this code to the Arrow project so that
> > > Arrow
> > > > > has
> > > > > >> a
> > > > > >> > Rust-native query execution engine built in and to accelerate
> > > > > >> development
> > > > > >> > of this capability.
> > > > > >> >
> > > > > >> > I have a fairly detailed roadmap[2] in mind for the project
> and it
> > > > > could
> > > > > >> > eventually become a standalone project potentially (under ASF
> > > still).
> > > > > >> >
> > > > > >> > I don't know what the process is to vote on this, so wanted to
> > > discuss
> > > > > >> that
> > > > > >> > in this thread first.
> > > > > >> >
> > > > > >> > References:
> > > > > >> >
> > > > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > > > > >> > [2] Roadmap:
> > > > > >> >
> https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Andy.
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Wes McKinney <we...@gmail.com>.
Getting the 0.12 release out is my priority right now, but it seems
that there are no major objections to this code donation.

@Andy -- I can kick off the vote to accept the code donation in the
next few days if you'd like to proceed with that. How much time do you
think it would take for you to ready the merge?

Thanks,
Wes

On Wed, Jan 9, 2019 at 8:28 AM Andy Grove <an...@gmail.com> wrote:
>
> Wes,
>
> Thanks. This sounds great.
>
> Andy.
>
> On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Andy -- I'm supportive of the code donation. I see building
> > in-memory, embeddable analytics and query processing as the natural
> > next stage of this project. As I have described on this mailing list,
> > I intend to work on this with my colleagues in C++ with the goal of
> > making such functionality available at least in C, Python, R, and
> > Ruby. I see no reason why such work should be exclusive to C++.
> >
> > Rust seems like a reasonable implementation language for this, and
> > given growing interest in the language, I think it will help grow the
> > Arrow community.
> >
> > I'd like to wait a few more days to allow others to weigh in, but we
> > could conduct a vote about accepting the code donation as early as
> > next week. We would need to go through the ASF IP Clearance process
> > after that. So the entire procedural process would take about 6 days,
> > assuming that there are no licensing issues and the code will be ready
> > to merge into the Arrow codebase.
> >
> > Thanks
> > Wes
> >
> > On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <ne...@gmail.com>
> > wrote:
> > >
> > > Hi Andy,
> > >
> > > I can't comment on the voting process, but regarding the addition of
> > > DataFusion:
> > >
> > > I support the idea to donate the code, mainly as I think that will help
> > us
> > > accelerate some work on Rust. Out of curiousity, I've been prototying a
> > > 'Rust dataframe' abstraction which (can/will) have various scalar,
> > > aggregation, array and window functions.
> > >
> > > I'm doing this trying to put on the hat of someone wanting to use Rust in
> > > their binary or library. I'm already finding some things that might be
> > > *core* but are still not yet implemented. The presence of array_ops is
> > also
> > > helpful because in addition to an efficient in-memory rep of data, they
> > > enable one to do some basic data manipulation on such data.
> > >
> > > Having DataFusion added to Arrow could help fill some gaps in our
> > codebase;
> > > and I'm willing to work there.
> > >
> > > Regards
> > > Neville
> > >
> > > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com> wrote:
> > >
> > > > Bumping this thread ... I know everyone is busy with getting the 0.12
> > > > release out, but would be good to know the process for raising this
> > for a
> > > > vote. However, given the lack of comments on this thread I'm starting
> > to
> > > > suspect that maybe there isn't much of an appetite for this, which is
> > fine,
> > > > but would be good to find out for sure.
> > > >
> > > > Thanks,
> > > >
> > > > Andy.
> > > >
> > > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks, Ted!
> > > > >
> > > > > I wish I'd been a bit more specific about my ask in the original
> > email...
> > > > > I guess my question (for Wes?) is what is the process to raise this
> > for a
> > > > > vote?
> > > > >
> > > > > Andy.
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Cool!
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > I'm starting a new thread for this discussion (this was previously
> > > > >> > discussed in the Rust Roadmap thread).
> > > > >> >
> > > > >> > The reason I got involved with Arrow is that I have been working
> > on
> > > > >> > DataFusion[1] which is currently an in-process SQL query engine
> > on top
> > > > >> of
> > > > >> > Arrow. It allows queries to be executed against the Arrow CSV
> > reader
> > > > >> (and
> > > > >> > will shortly support the Arrow Parquet reader too) and presents
> > > > results
> > > > >> as
> > > > >> > a sequence of RecordBatch instances.
> > > > >> >
> > > > >> > I would like to donate this code to the Arrow project so that
> > Arrow
> > > > has
> > > > >> a
> > > > >> > Rust-native query execution engine built in and to accelerate
> > > > >> development
> > > > >> > of this capability.
> > > > >> >
> > > > >> > I have a fairly detailed roadmap[2] in mind for the project and it
> > > > could
> > > > >> > eventually become a standalone project potentially (under ASF
> > still).
> > > > >> >
> > > > >> > I don't know what the process is to vote on this, so wanted to
> > discuss
> > > > >> that
> > > > >> > in this thread first.
> > > > >> >
> > > > >> > References:
> > > > >> >
> > > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > > > >> > [2] Roadmap:
> > > > >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Andy.
> > > > >> >
> > > > >>
> > > > >
> > > >
> >

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Wes,

Thanks. This sounds great.

Andy.

On Tue, Jan 8, 2019 at 8:28 AM Wes McKinney <we...@gmail.com> wrote:

> hi Andy -- I'm supportive of the code donation. I see building
> in-memory, embeddable analytics and query processing as the natural
> next stage of this project. As I have described on this mailing list,
> I intend to work on this with my colleagues in C++ with the goal of
> making such functionality available at least in C, Python, R, and
> Ruby. I see no reason why such work should be exclusive to C++.
>
> Rust seems like a reasonable implementation language for this, and
> given growing interest in the language, I think it will help grow the
> Arrow community.
>
> I'd like to wait a few more days to allow others to weigh in, but we
> could conduct a vote about accepting the code donation as early as
> next week. We would need to go through the ASF IP Clearance process
> after that. So the entire procedural process would take about 6 days,
> assuming that there are no licensing issues and the code will be ready
> to merge into the Arrow codebase.
>
> Thanks
> Wes
>
> On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <ne...@gmail.com>
> wrote:
> >
> > Hi Andy,
> >
> > I can't comment on the voting process, but regarding the addition of
> > DataFusion:
> >
> > I support the idea to donate the code, mainly as I think that will help
> us
> > accelerate some work on Rust. Out of curiousity, I've been prototying a
> > 'Rust dataframe' abstraction which (can/will) have various scalar,
> > aggregation, array and window functions.
> >
> > I'm doing this trying to put on the hat of someone wanting to use Rust in
> > their binary or library. I'm already finding some things that might be
> > *core* but are still not yet implemented. The presence of array_ops is
> also
> > helpful because in addition to an efficient in-memory rep of data, they
> > enable one to do some basic data manipulation on such data.
> >
> > Having DataFusion added to Arrow could help fill some gaps in our
> codebase;
> > and I'm willing to work there.
> >
> > Regards
> > Neville
> >
> > On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com> wrote:
> >
> > > Bumping this thread ... I know everyone is busy with getting the 0.12
> > > release out, but would be good to know the process for raising this
> for a
> > > vote. However, given the lack of comments on this thread I'm starting
> to
> > > suspect that maybe there isn't much of an appetite for this, which is
> fine,
> > > but would be good to find out for sure.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com>
> wrote:
> > >
> > > > Thanks, Ted!
> > > >
> > > > I wish I'd been a bit more specific about my ask in the original
> email...
> > > > I guess my question (for Wes?) is what is the process to raise this
> for a
> > > > vote?
> > > >
> > > > Andy.
> > > >
> > > >
> > > >
> > > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com>
> > > wrote:
> > > >
> > > >> Cool!
> > > >>
> > > >>
> > > >>
> > > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com>
> > > wrote:
> > > >>
> > > >> > I'm starting a new thread for this discussion (this was previously
> > > >> > discussed in the Rust Roadmap thread).
> > > >> >
> > > >> > The reason I got involved with Arrow is that I have been working
> on
> > > >> > DataFusion[1] which is currently an in-process SQL query engine
> on top
> > > >> of
> > > >> > Arrow. It allows queries to be executed against the Arrow CSV
> reader
> > > >> (and
> > > >> > will shortly support the Arrow Parquet reader too) and presents
> > > results
> > > >> as
> > > >> > a sequence of RecordBatch instances.
> > > >> >
> > > >> > I would like to donate this code to the Arrow project so that
> Arrow
> > > has
> > > >> a
> > > >> > Rust-native query execution engine built in and to accelerate
> > > >> development
> > > >> > of this capability.
> > > >> >
> > > >> > I have a fairly detailed roadmap[2] in mind for the project and it
> > > could
> > > >> > eventually become a standalone project potentially (under ASF
> still).
> > > >> >
> > > >> > I don't know what the process is to vote on this, so wanted to
> discuss
> > > >> that
> > > >> > in this thread first.
> > > >> >
> > > >> > References:
> > > >> >
> > > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > > >> > [2] Roadmap:
> > > >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Andy.
> > > >> >
> > > >>
> > > >
> > >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Wes McKinney <we...@gmail.com>.
hi Andy -- I'm supportive of the code donation. I see building
in-memory, embeddable analytics and query processing as the natural
next stage of this project. As I have described on this mailing list,
I intend to work on this with my colleagues in C++ with the goal of
making such functionality available at least in C, Python, R, and
Ruby. I see no reason why such work should be exclusive to C++.

Rust seems like a reasonable implementation language for this, and
given growing interest in the language, I think it will help grow the
Arrow community.

I'd like to wait a few more days to allow others to weigh in, but we
could conduct a vote about accepting the code donation as early as
next week. We would need to go through the ASF IP Clearance process
after that. So the entire procedural process would take about 6 days,
assuming that there are no licensing issues and the code will be ready
to merge into the Arrow codebase.

Thanks
Wes

On Tue, Jan 8, 2019 at 9:07 AM Neville Dipale <ne...@gmail.com> wrote:
>
> Hi Andy,
>
> I can't comment on the voting process, but regarding the addition of
> DataFusion:
>
> I support the idea to donate the code, mainly as I think that will help us
> accelerate some work on Rust. Out of curiousity, I've been prototying a
> 'Rust dataframe' abstraction which (can/will) have various scalar,
> aggregation, array and window functions.
>
> I'm doing this trying to put on the hat of someone wanting to use Rust in
> their binary or library. I'm already finding some things that might be
> *core* but are still not yet implemented. The presence of array_ops is also
> helpful because in addition to an efficient in-memory rep of data, they
> enable one to do some basic data manipulation on such data.
>
> Having DataFusion added to Arrow could help fill some gaps in our codebase;
> and I'm willing to work there.
>
> Regards
> Neville
>
> On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com> wrote:
>
> > Bumping this thread ... I know everyone is busy with getting the 0.12
> > release out, but would be good to know the process for raising this for a
> > vote. However, given the lack of comments on this thread I'm starting to
> > suspect that maybe there isn't much of an appetite for this, which is fine,
> > but would be good to find out for sure.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com> wrote:
> >
> > > Thanks, Ted!
> > >
> > > I wish I'd been a bit more specific about my ask in the original email...
> > > I guess my question (for Wes?) is what is the process to raise this for a
> > > vote?
> > >
> > > Andy.
> > >
> > >
> > >
> > > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com>
> > wrote:
> > >
> > >> Cool!
> > >>
> > >>
> > >>
> > >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com>
> > wrote:
> > >>
> > >> > I'm starting a new thread for this discussion (this was previously
> > >> > discussed in the Rust Roadmap thread).
> > >> >
> > >> > The reason I got involved with Arrow is that I have been working on
> > >> > DataFusion[1] which is currently an in-process SQL query engine on top
> > >> of
> > >> > Arrow. It allows queries to be executed against the Arrow CSV reader
> > >> (and
> > >> > will shortly support the Arrow Parquet reader too) and presents
> > results
> > >> as
> > >> > a sequence of RecordBatch instances.
> > >> >
> > >> > I would like to donate this code to the Arrow project so that Arrow
> > has
> > >> a
> > >> > Rust-native query execution engine built in and to accelerate
> > >> development
> > >> > of this capability.
> > >> >
> > >> > I have a fairly detailed roadmap[2] in mind for the project and it
> > could
> > >> > eventually become a standalone project potentially (under ASF still).
> > >> >
> > >> > I don't know what the process is to vote on this, so wanted to discuss
> > >> that
> > >> > in this thread first.
> > >> >
> > >> > References:
> > >> >
> > >> > [1] DataFusion: https://github.com/andygrove/datafusion
> > >> > [2] Roadmap:
> > >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Andy.
> > >> >
> > >>
> > >
> >

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Neville Dipale <ne...@gmail.com>.
Hi Andy,

I can't comment on the voting process, but regarding the addition of
DataFusion:

I support the idea to donate the code, mainly as I think that will help us
accelerate some work on Rust. Out of curiousity, I've been prototying a
'Rust dataframe' abstraction which (can/will) have various scalar,
aggregation, array and window functions.

I'm doing this trying to put on the hat of someone wanting to use Rust in
their binary or library. I'm already finding some things that might be
*core* but are still not yet implemented. The presence of array_ops is also
helpful because in addition to an efficient in-memory rep of data, they
enable one to do some basic data manipulation on such data.

Having DataFusion added to Arrow could help fill some gaps in our codebase;
and I'm willing to work there.

Regards
Neville

On Tue, 8 Jan 2019 at 16:14, Andy Grove <an...@gmail.com> wrote:

> Bumping this thread ... I know everyone is busy with getting the 0.12
> release out, but would be good to know the process for raising this for a
> vote. However, given the lack of comments on this thread I'm starting to
> suspect that maybe there isn't much of an appetite for this, which is fine,
> but would be good to find out for sure.
>
> Thanks,
>
> Andy.
>
> On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com> wrote:
>
> > Thanks, Ted!
> >
> > I wish I'd been a bit more specific about my ask in the original email...
> > I guess my question (for Wes?) is what is the process to raise this for a
> > vote?
> >
> > Andy.
> >
> >
> >
> > On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com>
> wrote:
> >
> >> Cool!
> >>
> >>
> >>
> >> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com>
> wrote:
> >>
> >> > I'm starting a new thread for this discussion (this was previously
> >> > discussed in the Rust Roadmap thread).
> >> >
> >> > The reason I got involved with Arrow is that I have been working on
> >> > DataFusion[1] which is currently an in-process SQL query engine on top
> >> of
> >> > Arrow. It allows queries to be executed against the Arrow CSV reader
> >> (and
> >> > will shortly support the Arrow Parquet reader too) and presents
> results
> >> as
> >> > a sequence of RecordBatch instances.
> >> >
> >> > I would like to donate this code to the Arrow project so that Arrow
> has
> >> a
> >> > Rust-native query execution engine built in and to accelerate
> >> development
> >> > of this capability.
> >> >
> >> > I have a fairly detailed roadmap[2] in mind for the project and it
> could
> >> > eventually become a standalone project potentially (under ASF still).
> >> >
> >> > I don't know what the process is to vote on this, so wanted to discuss
> >> that
> >> > in this thread first.
> >> >
> >> > References:
> >> >
> >> > [1] DataFusion: https://github.com/andygrove/datafusion
> >> > [2] Roadmap:
> >> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> >> >
> >> > Thanks,
> >> >
> >> > Andy.
> >> >
> >>
> >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Bumping this thread ... I know everyone is busy with getting the 0.12
release out, but would be good to know the process for raising this for a
vote. However, given the lack of comments on this thread I'm starting to
suspect that maybe there isn't much of an appetite for this, which is fine,
but would be good to find out for sure.

Thanks,

Andy.

On Mon, Jan 7, 2019 at 1:03 PM Andy Grove <an...@gmail.com> wrote:

> Thanks, Ted!
>
> I wish I'd been a bit more specific about my ask in the original email...
> I guess my question (for Wes?) is what is the process to raise this for a
> vote?
>
> Andy.
>
>
>
> On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com> wrote:
>
>> Cool!
>>
>>
>>
>> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com> wrote:
>>
>> > I'm starting a new thread for this discussion (this was previously
>> > discussed in the Rust Roadmap thread).
>> >
>> > The reason I got involved with Arrow is that I have been working on
>> > DataFusion[1] which is currently an in-process SQL query engine on top
>> of
>> > Arrow. It allows queries to be executed against the Arrow CSV reader
>> (and
>> > will shortly support the Arrow Parquet reader too) and presents results
>> as
>> > a sequence of RecordBatch instances.
>> >
>> > I would like to donate this code to the Arrow project so that Arrow has
>> a
>> > Rust-native query execution engine built in and to accelerate
>> development
>> > of this capability.
>> >
>> > I have a fairly detailed roadmap[2] in mind for the project and it could
>> > eventually become a standalone project potentially (under ASF still).
>> >
>> > I don't know what the process is to vote on this, so wanted to discuss
>> that
>> > in this thread first.
>> >
>> > References:
>> >
>> > [1] DataFusion: https://github.com/andygrove/datafusion
>> > [2] Roadmap:
>> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
>> >
>> > Thanks,
>> >
>> > Andy.
>> >
>>
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Andy Grove <an...@gmail.com>.
Thanks, Ted!

I wish I'd been a bit more specific about my ask in the original email... I
guess my question (for Wes?) is what is the process to raise this for a
vote?

Andy.



On Sun, Jan 6, 2019 at 2:59 PM Ted Dunning <te...@gmail.com> wrote:

> Cool!
>
>
>
> On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com> wrote:
>
> > I'm starting a new thread for this discussion (this was previously
> > discussed in the Rust Roadmap thread).
> >
> > The reason I got involved with Arrow is that I have been working on
> > DataFusion[1] which is currently an in-process SQL query engine on top of
> > Arrow. It allows queries to be executed against the Arrow CSV reader (and
> > will shortly support the Arrow Parquet reader too) and presents results
> as
> > a sequence of RecordBatch instances.
> >
> > I would like to donate this code to the Arrow project so that Arrow has a
> > Rust-native query execution engine built in and to accelerate development
> > of this capability.
> >
> > I have a fairly detailed roadmap[2] in mind for the project and it could
> > eventually become a standalone project potentially (under ASF still).
> >
> > I don't know what the process is to vote on this, so wanted to discuss
> that
> > in this thread first.
> >
> > References:
> >
> > [1] DataFusion: https://github.com/andygrove/datafusion
> > [2] Roadmap:
> > https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
> >
> > Thanks,
> >
> > Andy.
> >
>

Re: [Rust] [DISCUSS] Donate DataFusion to Arrow project

Posted by Ted Dunning <te...@gmail.com>.
Cool!



On Sun, Jan 6, 2019 at 1:52 PM Andy Grove <an...@gmail.com> wrote:

> I'm starting a new thread for this discussion (this was previously
> discussed in the Rust Roadmap thread).
>
> The reason I got involved with Arrow is that I have been working on
> DataFusion[1] which is currently an in-process SQL query engine on top of
> Arrow. It allows queries to be executed against the Arrow CSV reader (and
> will shortly support the Arrow Parquet reader too) and presents results as
> a sequence of RecordBatch instances.
>
> I would like to donate this code to the Arrow project so that Arrow has a
> Rust-native query execution engine built in and to accelerate development
> of this capability.
>
> I have a fairly detailed roadmap[2] in mind for the project and it could
> eventually become a standalone project potentially (under ASF still).
>
> I don't know what the process is to vote on this, so wanted to discuss that
> in this thread first.
>
> References:
>
> [1] DataFusion: https://github.com/andygrove/datafusion
> [2] Roadmap:
> https://github.com/andygrove/datafusion/blob/master/ROADMAP.md
>
> Thanks,
>
> Andy.
>