You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Bjoern Bachmann <bj...@gmx.de> on 2021/07/22 14:03:45 UTC

[C++] Computation functions in Apache Arrow

Hey,



I would like to better understand arrows built-in compute functionality. For
example if I've single array "X" of float type and I want to do the following
calc:



X*gain + offset and I



How can I write it in a single instruction line or do I need it sequentially?
What do I need to do if gain or offset has a different data type?



Does arrow support matrix operations?



Thanks!



Bjoern


Re: [C++] Computation functions in Apache Arrow

Posted by Rok Mihevc <ro...@gmail.com>.
On Thu, Jul 22, 2021 at 6:54 PM Weston Pace <we...@gmail.com> wrote:
> > Does arrow support matrix operations?
>
> [...]
>
> On the other hand, there has been some interest in the past in
> representing tensors as a logical data type in Arrow.  A rank 2 tensor
> is either the same as a matrix or very similar to a matrix (depending
> on who you ask).  Matrix multiplication could be implemented as a
> compute kernel for arrays of rank-2 tensors.  That being said, I have
> not seen any discussion or JIRA issues on tensor compute functions and
> so I don't know that anyone is working on that.

I'd be interested in working on tensor/sparse tensor computation
kernels if there's interest / need for it.

Rok

Re: [C++] Computation functions in Apache Arrow

Posted by Weston Pace <we...@gmail.com>.
> How can I write it in a single instruction line or do I need it sequentially?

You cannot do this as a single compute call today.  ARROW-12060[1]
aims to add support for running expressions as a single call which may
fill this need.

> What do I need to do if gain or offset has a different data type?

The compute machinery does some implicit casting.  So long as both
types are numeric you should be ok.  More details can be found at [2].
If the types are dissimilar (e.g. string/int32) you may need to
explicitly cast (with the cast compute function) the data yourself.

> Does arrow support matrix operations?

No.  You may be thinking of interpreting a table as a matrix.  It is
natural to think of datasets and matrices as similar (e.g. I think
python allows you to perform matrix multiplication on datasets) but I
don't think I've seen any discussion of doing the same in Arrow.

On the other hand, there has been some interest in the past in
representing tensors as a logical data type in Arrow.  A rank 2 tensor
is either the same as a matrix or very similar to a matrix (depending
on who you ask).  Matrix multiplication could be implemented as a
compute kernel for arrays of rank-2 tensors.  That being said, I have
not seen any discussion or JIRA issues on tensor compute functions and
so I don't know that anyone is working on that.

[1] https://issues.apache.org/jira/browse/ARROW-12060
[2] https://arrow.apache.org/docs/cpp/compute.html#implicit-casts

On Thu, Jul 22, 2021 at 4:04 AM Bjoern Bachmann <bj...@gmx.de> wrote:
>
> Hey,
>
> I would like to better understand arrows built-in compute functionality. For example if I've single array "X" of float type and I want to do the following calc:
>
> X*gain + offset and I
>
> How can I write it in a single instruction line or do I need it sequentially? What do I need to do if gain or offset has a different data type?
>
> Does arrow support matrix operations?
>
> Thanks!
>
> Bjoern