You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Clark Fitzgerald <cl...@gmail.com> on 2017/07/27 23:40:43 UTC

R arrow first steps

I've got at least a "hello world" for R / Arrow bindings in progress.
https://github.com/clarkfitzg/Rarrow

Over the next couple weeks I plan to spend some time looking at the Arrow
C++ and Python sources and write a few bindings by hand, then think about
how to automatically generate bindings from the C++. Several approaches are
possible, Rffi / rdyncall, Rcpp modules, or RCodegen / RCIndex leveraging
Clang. Not sure which, if any, will work.

I'm a beginner in C++. It would be very helpful if someone was available to
answer questions on the C++ Arrow codebase, since I'd rather not email the
whole dev list for this.

Thanks,
Clark

Re: R arrow first steps

Posted by Clark Fitzgerald <cl...@gmail.com>.
> would you be open to working within an R/ subdirectory in the Arrow
codebase?

Sure, I'll do whatever is most convenient for the team. A branch sounds
fine. Here's one from my end: https://github.com/clarkfitzg/arrow/tree/R/R

Thanks for the pointers and encouragement.

On Thu, Jul 27, 2017 at 5:49 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Clark,
>
> Cool! Before you go too far down the rabbit hole, would you be open to
> working within an R/ subdirectory in the Arrow codebase? It doesn't
> have to be ready-to-ship software, and we are happy to set up a branch
> in the repository for you to experiment so you don't have to worry
> about bothering the master branch or breaking builds. Otherwise
> importing your work into the project later will become more
> complicated and require the Arrow PMC to do some paperwork:
> http://incubator.apache.org/ip-clearance/ .
>
> I am happy to be available to answer questions on the mailing list, or
> offline, or discussions in JIRA or on GitHub pull requests. I am sure
> that Uwe and the other C++ developers will be happy to be available.
>
> To get some basics off the ground, the essentials are being able to
> convert one or more record batches into an R data frame, and back.
> This is what we did in
>
> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/python/arrow_to_pandas.h
> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/python/pandas_to_arrow.h
>
> We have thin bindings in Cython (which is similar to Rcpp) that make
> this callable from Python.
>
> What Hadley and I put together quickly for Feather last year was
> effectively a single Arrow record batch converting to and from pandas
> or R data frames. In Arrow, in practice you may be working with a
> table in many smaller chunks.
>
> Looking forward to getting this off the ground!
>
> Thanks,
> Wes
>
> On Thu, Jul 27, 2017 at 7:40 PM, Clark Fitzgerald <cl...@gmail.com>
> wrote:
> > I've got at least a "hello world" for R / Arrow bindings in progress.
> > https://github.com/clarkfitzg/Rarrow
> >
> > Over the next couple weeks I plan to spend some time looking at the Arrow
> > C++ and Python sources and write a few bindings by hand, then think about
> > how to automatically generate bindings from the C++. Several approaches
> are
> > possible, Rffi / rdyncall, Rcpp modules, or RCodegen / RCIndex leveraging
> > Clang. Not sure which, if any, will work.
> >
> > I'm a beginner in C++. It would be very helpful if someone was available
> to
> > answer questions on the C++ Arrow codebase, since I'd rather not email
> the
> > whole dev list for this.
> >
> > Thanks,
> > Clark
>

Re: R arrow first steps

Posted by Wes McKinney <we...@gmail.com>.
hi Clark,

Cool! Before you go too far down the rabbit hole, would you be open to
working within an R/ subdirectory in the Arrow codebase? It doesn't
have to be ready-to-ship software, and we are happy to set up a branch
in the repository for you to experiment so you don't have to worry
about bothering the master branch or breaking builds. Otherwise
importing your work into the project later will become more
complicated and require the Arrow PMC to do some paperwork:
http://incubator.apache.org/ip-clearance/ .

I am happy to be available to answer questions on the mailing list, or
offline, or discussions in JIRA or on GitHub pull requests. I am sure
that Uwe and the other C++ developers will be happy to be available.

To get some basics off the ground, the essentials are being able to
convert one or more record batches into an R data frame, and back.
This is what we did in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/arrow_to_pandas.h
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pandas_to_arrow.h

We have thin bindings in Cython (which is similar to Rcpp) that make
this callable from Python.

What Hadley and I put together quickly for Feather last year was
effectively a single Arrow record batch converting to and from pandas
or R data frames. In Arrow, in practice you may be working with a
table in many smaller chunks.

Looking forward to getting this off the ground!

Thanks,
Wes

On Thu, Jul 27, 2017 at 7:40 PM, Clark Fitzgerald <cl...@gmail.com> wrote:
> I've got at least a "hello world" for R / Arrow bindings in progress.
> https://github.com/clarkfitzg/Rarrow
>
> Over the next couple weeks I plan to spend some time looking at the Arrow
> C++ and Python sources and write a few bindings by hand, then think about
> how to automatically generate bindings from the C++. Several approaches are
> possible, Rffi / rdyncall, Rcpp modules, or RCodegen / RCIndex leveraging
> Clang. Not sure which, if any, will work.
>
> I'm a beginner in C++. It would be very helpful if someone was available to
> answer questions on the C++ Arrow codebase, since I'd rather not email the
> whole dev list for this.
>
> Thanks,
> Clark