You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Yaron Gvili <rt...@hotmail.com> on 2022/06/06 21:54:14 UTC

arithmetic manipulation of PyArrow numeric arrays

Hi,

This is likely a question (or two) with a simple answer that I couldn't easily find. While working with PyArrow UDFs, I tried implementing a simple UDF (see first function below) and noticed that it failed upon receiving a pyarrow.lib.DoubleArray which cannot be directly manipulated with arithmetic operations like multiplication by an int. I was able to get around it by converting to pandas, manipulating, and converting back (see second function below), but this seems awkward. What is the idiomatic way of performing arithmetic operations on PyArrow numeric arrays? Does it make sense to add arithmetic operation support to PyArrow numeric arrays?


def twice(v):
    return v * 2  # fails with TypeError("unsupported operand type(s) for *: 'pyarrow.lib.DoubleArray' and 'int'")

def twice(v):
    return pa.FloatingPointArray.from_pandas(v.to_pandas() * 2)  # works but seems awkward


Cheers,
Yaron.

Re: arithmetic manipulation of PyArrow numeric arrays

Posted by Yaron Gvili <rt...@hotmail.com>.
Perfect, thanks.


Yaron.
________________________________
From: Will Jones <wi...@gmail.com>
Sent: Monday, June 6, 2022 6:06 PM
To: dev@arrow.apache.org <de...@arrow.apache.org>
Subject: Re: arithmetic manipulation of PyArrow numeric arrays

Hi Yaron,

Currently, arithmetic operators are exposed through the
pyarrow.compute module:

import pyarrow as pa
import pyarrow.compute as pc

arr = pa.array([1, 2, 3])
pc.add(arr, 2) # Add 2
pc.multiply(arr, 20) # Multiple by 20

I actually just opened an issue on making this more convenient [1].

Best,
Will Jones

[1] https://issues.apache.org/jira/browse/ARROW-16658

On Mon, Jun 6, 2022 at 2:54 PM Yaron Gvili <rt...@hotmail.com> wrote:

> Hi,
>
> This is likely a question (or two) with a simple answer that I couldn't
> easily find. While working with PyArrow UDFs, I tried implementing a simple
> UDF (see first function below) and noticed that it failed upon receiving a
> pyarrow.lib.DoubleArray which cannot be directly manipulated with
> arithmetic operations like multiplication by an int. I was able to get
> around it by converting to pandas, manipulating, and converting back (see
> second function below), but this seems awkward. What is the idiomatic way
> of performing arithmetic operations on PyArrow numeric arrays? Does it make
> sense to add arithmetic operation support to PyArrow numeric arrays?
>
>
> def twice(v):
>     return v * 2  # fails with TypeError("unsupported operand type(s) for
> *: 'pyarrow.lib.DoubleArray' and 'int'")
>
> def twice(v):
>     return pa.FloatingPointArray.from_pandas(v.to_pandas() * 2)  # works
> but seems awkward
>
>
> Cheers,
> Yaron.
>

Re: arithmetic manipulation of PyArrow numeric arrays

Posted by Will Jones <wi...@gmail.com>.
Hi Yaron,

Currently, arithmetic operators are exposed through the
pyarrow.compute module:

import pyarrow as pa
import pyarrow.compute as pc

arr = pa.array([1, 2, 3])
pc.add(arr, 2) # Add 2
pc.multiply(arr, 20) # Multiple by 20

I actually just opened an issue on making this more convenient [1].

Best,
Will Jones

[1] https://issues.apache.org/jira/browse/ARROW-16658

On Mon, Jun 6, 2022 at 2:54 PM Yaron Gvili <rt...@hotmail.com> wrote:

> Hi,
>
> This is likely a question (or two) with a simple answer that I couldn't
> easily find. While working with PyArrow UDFs, I tried implementing a simple
> UDF (see first function below) and noticed that it failed upon receiving a
> pyarrow.lib.DoubleArray which cannot be directly manipulated with
> arithmetic operations like multiplication by an int. I was able to get
> around it by converting to pandas, manipulating, and converting back (see
> second function below), but this seems awkward. What is the idiomatic way
> of performing arithmetic operations on PyArrow numeric arrays? Does it make
> sense to add arithmetic operation support to PyArrow numeric arrays?
>
>
> def twice(v):
>     return v * 2  # fails with TypeError("unsupported operand type(s) for
> *: 'pyarrow.lib.DoubleArray' and 'int'")
>
> def twice(v):
>     return pa.FloatingPointArray.from_pandas(v.to_pandas() * 2)  # works
> but seems awkward
>
>
> Cheers,
> Yaron.
>