You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Yibo Cai <yi...@arm.com> on 2019/11/08 07:25:25 UTC

Re: some questions, please help

Hi Wes,

On 10/30/19 10:24 PM, Wes McKinney wrote:
> hi Yibo
> 
> On Wed, Oct 30, 2019 at 2:16 AM Yibo Cai <yi...@arm.com> wrote:
>>
>> Hi,
>>
>> I'm new to Arrow. Would like to seek for help about some questions. Any comment is welcomed.
>>
>> - About source code tree, my understand is that "cpp" is the core arrow libraries, "c_glib, go, python, ..." are language bindings to ease integrating arrow into apps developed by that language. Is that correct?
> 
> No. We have 6 core implementations: C++, C#, Go, Java, JavaScript, and Rust
> 
> * C/GLib, MATLAB, Python, R bind to C++
> * Ruby binds to GLib
> 

I wonder how arrow deals with gaps among different implementations? Say, C++ lib implements some features go lib doesn't support. Is there a consistent API document, or documents for each language implementation?

>> - Arrow implements many data types and aggregation functions(sum, mean, ...). [1]
>>     IMO, more functions and types should be supported, like min/max, vector/tensor operations, big number, etc. I'm not sure if this is in arrow's scope, or the apps using arrow should deal with it themselves.
> 
> Our objective at least in the C++ library is to have a generally
> useful "standard library" that handles common application concerns.
> Whether or not something is thought to be in scope may vary on a case
> by case basis -- if you can't find a JIRA issue for something in
> particular, please go ahead and open one.
> 
>> - I see some SIMD optimizations in arrow go binding, such as vectored sum. [2]
>>     But arrow cpp lib doesn't leverage SIMD. [3]
>>     Why not optimize it in cpp lib so all languages can benefit?
> 
> You're welcome to contribute such optimizations to the C++ library
> 
> 
> - Wes
> 
>> [1] https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/kernels
>> [2] https://github.com/apache/arrow/blob/master/go/arrow/math/float64_avx2_amd64.s
>> [3] https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sum_internal.h#L99-L111
>>
>> Yibo

Re: some questions, please help

Posted by Micah Kornfield <em...@gmail.com>.
>
> I wonder how arrow deals with gaps among different implementations? Say,
> C++ lib implements some features go lib doesn't support. Is there a
> consistent API document, or documents for each language implementation?


It is important to distinguish between two types of functionality:
1.  Supporting all the features of the interchange format(s).   In this
case the canonical document is the format specification [1]
2.  Additional functionality for processing arrow data (e.g. query engines,
slicing, etc).

For 1 we have integration tests [2] and known gaps for some implementation
(search for skip.add in datagen.py) which should all have JIRAs associated
with them.  Some of the implementations (e.g. C# have not been added to the
integration tests at all).

For 2 the community has not been concerned with keeping feature parity.
For instance, the Java library has a substantially different class
naming/hierarchy than  C++.  Also, at least at the moment, no one has
expressed interest in implementing a query engine/dataframe library as part
of the Arrow project in Java (work has mostly been focused on some
performance improvement and some algorithms that contributors have found
useful).

Hope this helps.

-Micah

[1]
https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst
[2]
https://github.com/apache/arrow/blob/5ca85922ae90bacb96d939503e53e83e6ec47f8c/dev/archery/archery/integration/datagen.py

On Thu, Nov 7, 2019 at 11:25 PM Yibo Cai <yi...@arm.com> wrote:

> Hi Wes,
>
> On 10/30/19 10:24 PM, Wes McKinney wrote:
> > hi Yibo
> >
> > On Wed, Oct 30, 2019 at 2:16 AM Yibo Cai <yi...@arm.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm new to Arrow. Would like to seek for help about some questions. Any
> comment is welcomed.
> >>
> >> - About source code tree, my understand is that "cpp" is the core arrow
> libraries, "c_glib, go, python, ..." are language bindings to ease
> integrating arrow into apps developed by that language. Is that correct?
> >
> > No. We have 6 core implementations: C++, C#, Go, Java, JavaScript, and
> Rust
> >
> > * C/GLib, MATLAB, Python, R bind to C++
> > * Ruby binds to GLib
> >
>
> I wonder how arrow deals with gaps among different implementations? Say,
> C++ lib implements some features go lib doesn't support. Is there a
> consistent API document, or documents for each language implementation?
>
> >> - Arrow implements many data types and aggregation functions(sum, mean,
> ...). [1]
> >>     IMO, more functions and types should be supported, like min/max,
> vector/tensor operations, big number, etc. I'm not sure if this is in
> arrow's scope, or the apps using arrow should deal with it themselves.
> >
> > Our objective at least in the C++ library is to have a generally
> > useful "standard library" that handles common application concerns.
> > Whether or not something is thought to be in scope may vary on a case
> > by case basis -- if you can't find a JIRA issue for something in
> > particular, please go ahead and open one.
> >
> >> - I see some SIMD optimizations in arrow go binding, such as vectored
> sum. [2]
> >>     But arrow cpp lib doesn't leverage SIMD. [3]
> >>     Why not optimize it in cpp lib so all languages can benefit?
> >
> > You're welcome to contribute such optimizations to the C++ library
> >
> >
> > - Wes
> >
> >> [1]
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/kernels
> >> [2]
> https://github.com/apache/arrow/blob/master/go/arrow/math/float64_avx2_amd64.s
> >> [3]
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sum_internal.h#L99-L111
> >>
> >> Yibo
>