You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Bjoern Bachmann <bj...@gmx.de> on 2021/07/22 14:03:45 UTC
[C++] Computation functions in Apache Arrow
Hey,
I would like to better understand arrows built-in compute functionality. For
example if I've single array "X" of float type and I want to do the following
calc:
X*gain + offset and I
How can I write it in a single instruction line or do I need it sequentially?
What do I need to do if gain or offset has a different data type?
Does arrow support matrix operations?
Thanks!
Bjoern
Re: [C++] Computation functions in Apache Arrow
Posted by Rok Mihevc <ro...@gmail.com>.
On Thu, Jul 22, 2021 at 6:54 PM Weston Pace <we...@gmail.com> wrote:
> > Does arrow support matrix operations?
>
> [...]
>
> On the other hand, there has been some interest in the past in
> representing tensors as a logical data type in Arrow. A rank 2 tensor
> is either the same as a matrix or very similar to a matrix (depending
> on who you ask). Matrix multiplication could be implemented as a
> compute kernel for arrays of rank-2 tensors. That being said, I have
> not seen any discussion or JIRA issues on tensor compute functions and
> so I don't know that anyone is working on that.
I'd be interested in working on tensor/sparse tensor computation
kernels if there's interest / need for it.
Rok
Re: [C++] Computation functions in Apache Arrow
Posted by Weston Pace <we...@gmail.com>.
> How can I write it in a single instruction line or do I need it sequentially?
You cannot do this as a single compute call today. ARROW-12060[1]
aims to add support for running expressions as a single call which may
fill this need.
> What do I need to do if gain or offset has a different data type?
The compute machinery does some implicit casting. So long as both
types are numeric you should be ok. More details can be found at [2].
If the types are dissimilar (e.g. string/int32) you may need to
explicitly cast (with the cast compute function) the data yourself.
> Does arrow support matrix operations?
No. You may be thinking of interpreting a table as a matrix. It is
natural to think of datasets and matrices as similar (e.g. I think
python allows you to perform matrix multiplication on datasets) but I
don't think I've seen any discussion of doing the same in Arrow.
On the other hand, there has been some interest in the past in
representing tensors as a logical data type in Arrow. A rank 2 tensor
is either the same as a matrix or very similar to a matrix (depending
on who you ask). Matrix multiplication could be implemented as a
compute kernel for arrays of rank-2 tensors. That being said, I have
not seen any discussion or JIRA issues on tensor compute functions and
so I don't know that anyone is working on that.
[1] https://issues.apache.org/jira/browse/ARROW-12060
[2] https://arrow.apache.org/docs/cpp/compute.html#implicit-casts
On Thu, Jul 22, 2021 at 4:04 AM Bjoern Bachmann <bj...@gmx.de> wrote:
>
> Hey,
>
> I would like to better understand arrows built-in compute functionality. For example if I've single array "X" of float type and I want to do the following calc:
>
> X*gain + offset and I
>
> How can I write it in a single instruction line or do I need it sequentially? What do I need to do if gain or offset has a different data type?
>
> Does arrow support matrix operations?
>
> Thanks!
>
> Bjoern