You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Max Grossman <jm...@gmail.com> on 2020/08/14 21:24:10 UTC

returning an arrow::Array from C++ to Python through pybind11

Hi all,

I've written a C++ library that uses arrow for its in-memory data
structures. I'd like to also add some Python APIs on top of this
library using pybind11, so that I can grab pyarrow wrappers of my C++
Arrow Arrays, convert them to numpy arrays, and then pass them in to
scikit-learn (or other Python libraries) without copying data around.

As an example, on the C++ side I've got a 1D vector class that wraps
an arrow array and has a method to convert the arrow array into a
pyarrow array:

        PyObject* get_local_pyarrow_array() {
            return arrow::py::wrap_array(std::dynamic_pointer_cast<arrow::Array,
                arrow::FixedSizeBinaryArray>(_arr));
        }

I've got some pybind11 registration code that registers the class and
that method:

    py::class_<ShmemML1D<double>>(m, "ShmemML1DD")
        .def("get_local_pyarrow_array",
                &ShmemML1D<double>::get_local_pyarrow_array);

And then I've got some Python code that calls this method (and which I
hope gets a pyarrow array as the return value):

arr = dist_arr.get_local_pyarrow_array()

Note that these are arrays that I'm constructing in C++ code and want
to expose to Python, so I don't already have a pre-existing pyarrow
instance to use. I'm trying to create a new one around my C++ arrays,
so that Python code can start manipulating those C++ arrays.

When I build and run all this, I just get told "Unable to convert
function return value to a Python type!":

Traceback (most recent call last):
  File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
15, in <module>
    random.rand(vec)
  File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
    arr = dist_arr.get_local_pyarrow_array()
TypeError: Unable to convert function return value to a Python type!
The signature was
        (self: shmem_ml.core.ShmemML1DD) -> _object
Traceback (most recent call last):
  File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
15, in <module>
    random.rand(vec)
  File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
    arr = dist_arr.get_local_pyarrow_array()
TypeError: Unable to convert function return value to a Python type!
The signature was
        (self: shmem_ml.core.ShmemML1DD) -> _object

I'm new to pybind11, so I suspect this may not be a problem with my
arrow usage as much as it is with my pybind11 usage. I wanted to ask
if there's a better way to be doing this that's recommended for
pyarrow applications. It seems there are cython examples in the docs,
would the suggestion be to drop pybind11 and write a wrapper of my C++
class in cython?

Thanks for any suggestions,

Max

Re: returning an arrow::Array from C++ to Python through pybind11

Posted by Wes McKinney <we...@gmail.com>.
Are you using the arrow::py::wrap_array function? You can follow some
other successful pybind11 projects that use the pyarrow C/C++ API. You
have to also call the import_pyarrow() function

https://github.com/blue-yonder/turbodbc/blob/0369d1329a0ea39982a4d8d169b8dd3f473e6689/cpp/turbodbc_arrow/Library/src/arrow_result_set.cpp#L338

On Fri, Aug 14, 2020 at 4:24 PM Max Grossman <jm...@gmail.com> wrote:
>
> Hi all,
>
> I've written a C++ library that uses arrow for its in-memory data
> structures. I'd like to also add some Python APIs on top of this
> library using pybind11, so that I can grab pyarrow wrappers of my C++
> Arrow Arrays, convert them to numpy arrays, and then pass them in to
> scikit-learn (or other Python libraries) without copying data around.
>
> As an example, on the C++ side I've got a 1D vector class that wraps
> an arrow array and has a method to convert the arrow array into a
> pyarrow array:
>
>         PyObject* get_local_pyarrow_array() {
>             return arrow::py::wrap_array(std::dynamic_pointer_cast<arrow::Array,
>                 arrow::FixedSizeBinaryArray>(_arr));
>         }
>
> I've got some pybind11 registration code that registers the class and
> that method:
>
>     py::class_<ShmemML1D<double>>(m, "ShmemML1DD")
>         .def("get_local_pyarrow_array",
>                 &ShmemML1D<double>::get_local_pyarrow_array);
>
> And then I've got some Python code that calls this method (and which I
> hope gets a pyarrow array as the return value):
>
> arr = dist_arr.get_local_pyarrow_array()
>
> Note that these are arrays that I'm constructing in C++ code and want
> to expose to Python, so I don't already have a pre-existing pyarrow
> instance to use. I'm trying to create a new one around my C++ arrays,
> so that Python code can start manipulating those C++ arrays.
>
> When I build and run all this, I just get told "Unable to convert
> function return value to a Python type!":
>
> Traceback (most recent call last):
>   File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
> 15, in <module>
>     random.rand(vec)
>   File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
>     arr = dist_arr.get_local_pyarrow_array()
> TypeError: Unable to convert function return value to a Python type!
> The signature was
>         (self: shmem_ml.core.ShmemML1DD) -> _object
> Traceback (most recent call last):
>   File "/global/homes/j/jmg3/shmem_ml/example/python_wrapper.py", line
> 15, in <module>
>     random.rand(vec)
>   File "/global/homes/j/jmg3/shmem_ml/src/shmem_ml/random.py", line 8, in rand
>     arr = dist_arr.get_local_pyarrow_array()
> TypeError: Unable to convert function return value to a Python type!
> The signature was
>         (self: shmem_ml.core.ShmemML1DD) -> _object
>
> I'm new to pybind11, so I suspect this may not be a problem with my
> arrow usage as much as it is with my pybind11 usage. I wanted to ask
> if there's a better way to be doing this that's recommended for
> pyarrow applications. It seems there are cython examples in the docs,
> would the suggestion be to drop pybind11 and write a wrapper of my C++
> class in cython?
>
> Thanks for any suggestions,
>
> Max