You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Hinko Kocevar <Hi...@ess.eu> on 2023/05/16 13:32:24 UTC

pyArrow calling into user C++ code

Hi,

I'm trying to understand if it is possible to have a C/C++ code (homebrew code) integrated into arrow such that a user of pyArrow would be able to utilize the homebrew functions (from python script).

The idea is to pass an arrow array/table (or numpy array?) to the external code, let it work on the input(s) to produce an arrow output array and return it to the user. Again, the choice of programming language for user is Python. I've noticed c data interface and c stream interface as well as user compute functions in the docs. It is not clear to me if any of those support my use case and further more how do I get to utilize that in Python once implemented in C++.

For example, something like https://numpy.org/doc/stable/user/c-info.html is what I would be after.

Can this be done in (py)arrow, or should I just do it in numpy ?

Thank you,
Hinko

Re: pyArrow calling into user C++ code

Posted by Hinko Kocevar <Hi...@ess.eu>.
Thank you both for the input !
//Hinko

On 17 May 2023, at 19:03, Aldrin <oc...@pm.me> wrote:


ooh, this is cool and a great point. I wasn't thinking of the development experience with my initial response. I have used the approach I mentioned before and since I was not using the same toolchain I was having to rebuild pyarrow from source. I'll hold off on an example of that since I think Weston's suggestion is a great one (and probably something I'll try in the near future).


# ------------------------------
# Aldrin

https://github.com/drin/
https://gitlab.com/octalene

Sent with Proton Mail<https://proton.me/> secure email.

------- Original Message -------
On Wednesday, May 17th, 2023 at 06:59, Weston Pace <we...@gmail.com> wrote:

The page that Aldrin linked is possible but it requires that you use the same toolchain and version as pyarrow. I would probably advise using the C data API first. By using the C data API you don't have to couple yourself so tightly with the pyarrow build. For example, your C++ extension can pin itself to Arrow version 5 and people using pyarrow 11 will still be able to use your extension without problems.

Since this question comes up fairly often I decided to create a quick minimal example of what this might look like. The example creates a C++ python module using pybind11. The C++ code relies on Arrow-C++ and interoperates with pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you can copy the C data API headers directly into your project. The example can be found at [1].

[1]: https://github.com/westonpace/arrow-cdata-example

On Tue, May 16, 2023 at 9:07 AM Aldrin <oc...@pm.me>> wrote:
You can definitely use C++! I will see if I can find an example, but in the meantime there's also this page in the docs [1].

[1]: https://arrow.apache.org/docs/python/integration/extending.html

Sent from Proton Mail for iOS


On Tue, May 16, 2023 at 06:32, Hinko Kocevar <Hinko.Kocevar@ess.eu<mailto:On+Tue,+May+16,+2023+at+06:32,+Hinko+Kocevar+%3C%3Ca+href=>> wrote:
Hi,

I'm trying to understand if it is possible to have a C/C++ code (homebrew code) integrated into arrow such that a user of pyArrow would be able to utilize the homebrew functions (from python script).

The idea is to pass an arrow array/table (or numpy array?) to the external code, let it work on the input(s) to produce an arrow output array and return it to the user. Again, the choice of programming language for user is Python. I've noticed c data interface and c stream interface as well as user compute functions in the docs. It is not clear to me if any of those support my use case and further more how do I get to utilize that in Python once implemented in C++.

For example, something like https://numpy.org/doc/stable/user/c-info.html is what I would be after.

Can this be done in (py)arrow, or should I just do it in numpy ?

Thank you,
Hinko

<publickey - octalene.dev@pm.me - 0x21969656.asc>

Re: pyArrow calling into user C++ code

Posted by Aldrin <oc...@pm.me>.
ooh, this is cool and a great point. I wasn't thinking of the development experience with my initial response. I have used the approach I mentioned before and since I was not using the same toolchain I was having to rebuild pyarrow from source. I'll hold off on an example of that since I think Weston's suggestion is a great one (and probably something I'll try in the near future).



# ------------------------------

# Aldrin


https://github.com/drin/

https://gitlab.com/octalene


Sent with Proton Mail secure email.

------- Original Message -------
On Wednesday, May 17th, 2023 at 06:59, Weston Pace <we...@gmail.com> wrote:


> The page that Aldrin linked is possible but it requires that you use the same toolchain and version as pyarrow. I would probably advise using the C data API first. By using the C data API you don't have to couple yourself so tightly with the pyarrow build. For example, your C++ extension can pin itself to Arrow version 5 and people using pyarrow 11 will still be able to use your extension without problems.
> 

> Since this question comes up fairly often I decided to create a quick minimal example of what this might look like. The example creates a C++ python module using pybind11. The C++ code relies on Arrow-C++ and interoperates with pyarrow. You would not need to use Arrow-C++ and could use nanoarrow or you can copy the C data API headers directly into your project. The example can be found at [1].
> 

> [1]: https://github.com/westonpace/arrow-cdata-example
> 

> On Tue, May 16, 2023 at 9:07 AM Aldrin <oc...@pm.me> wrote:
> 

> > You can definitely use C++! I will see if I can find an example, but in the meantime there's also this page in the docs [1].
> > 

> > [1]: https://arrow.apache.org/docs/python/integration/extending.html
> > 

> > Sent from Proton Mail for iOS
> > 

> > 

> > On Tue, May 16, 2023 at 06:32, Hinko Kocevar <Hi...@ess.eu> wrote:
> > 

> > > Hi,
> > > 

> > > I'm trying to understand if it is possible to have a C/C++ code (homebrew code) integrated into arrow such that a user of pyArrow would be able to utilize the homebrew functions (from python script).
> > > 

> > > The idea is to pass an arrow array/table (or numpy array?) to the external code, let it work on the input(s) to produce an arrow output array and return it to the user. Again, the choice of programming language for user is Python. I've noticed c data interface and c stream interface as well as user compute functions in the docs. It is not clear to me if any of those support my use case and further more how do I get to utilize that in Python once implemented in C++.
> > > 

> > > For example, something like https://numpy.org/doc/stable/user/c-info.html is what I would be after.
> > > 

> > > Can this be done in (py)arrow, or should I just do it in numpy ?
> > > 

> > > Thank you,
> > > Hinko

Re: pyArrow calling into user C++ code

Posted by Weston Pace <we...@gmail.com>.
The page that Aldrin linked is possible but it requires that you use the
same toolchain and version as pyarrow.  I would probably advise using the C
data API first.  By using the C data API you don't have to couple yourself
so tightly with the pyarrow build.  For example, your C++ extension can pin
itself to Arrow version 5 and people using pyarrow 11 will still be able to
use your extension without problems.

Since this question comes up fairly often I decided to create a quick
minimal example of what this might look like.  The example creates a C++
python module using pybind11.  The C++ code relies on Arrow-C++ and
interoperates with pyarrow.  You would not need to use Arrow-C++ and could
use nanoarrow or you can copy the C data API headers directly into your
project.  The example can be found at [1].

[1]: https://github.com/westonpace/arrow-cdata-example

On Tue, May 16, 2023 at 9:07 AM Aldrin <oc...@pm.me> wrote:

> You can definitely use C++! I will see if I can find an example, but in
> the meantime there's also this page in the docs [1].
>
> [1]: https://arrow.apache.org/docs/python/integration/extending.html
>
> Sent from Proton Mail for iOS
>
>
> On Tue, May 16, 2023 at 06:32, Hinko Kocevar <Hinko.Kocevar@ess.eu
> <On+Tue,+May+16,+2023+at+06:32,+Hinko+Kocevar+%3C%3Ca+href=>> wrote:
>
> Hi,
>
> I'm trying to understand if it is possible to have a C/C++ code (homebrew
> code) integrated into arrow such that a user of pyArrow would be able to
> utilize the homebrew functions (from python script).
>
> The idea is to pass an arrow array/table (or numpy array?) to the external
> code, let it work on the input(s) to produce an arrow output array and
> return it to the user. Again, the choice of programming language for user
> is Python. I've noticed c data interface and c stream interface as well as
> user compute functions in the docs. It is not clear to me if any of those
> support my use case and further more how do I get to utilize that in Python
> once implemented in C++.
>
> For example, something like https://numpy.org/doc/stable/user/c-info.html
> is what I would be after.
>
> Can this be done in (py)arrow, or should I just do it in numpy ?
>
> Thank you,
> Hinko
>
>

Re: pyArrow calling into user C++ code

Posted by Aldrin <oc...@pm.me>.
You can definitely use C++! I will see if I can find an example, but in the
meantime there's also this page in the docs [1].

  

[1]:
<https://arrow.apache.org/docs/python/integration/extending.html>[](https://arrow.apache.org/docs/python/integration/extending.html)

  

Sent from Proton Mail for iOS

  

  

On Tue, May 16, 2023 at 06:32, Hinko Kocevar <[Hinko.Kocevar@ess.eu](mailto:On
Tue, May 16, 2023 at 06:32, Hinko Kocevar <<a href=)> wrote:

> Hi,  
>  
> I'm trying to understand if it is possible to have a C/C++ code (homebrew
> code) integrated into arrow such that a user of pyArrow would be able to
> utilize the homebrew functions (from python script).  
>  
> The idea is to pass an arrow array/table (or numpy array?) to the external
> code, let it work on the input(s) to produce an arrow output array and
> return it to the user. Again, the choice of programming language for user is
> Python. I've noticed c data interface and c stream interface as well as user
> compute functions in the docs. It is not clear to me if any of those support
> my use case and further more how do I get to utilize that in Python once
> implemented in C++.  
>  
> For example, something like https://numpy.org/doc/stable/user/c-info.html is
> what I would be after.  
>  
> Can this be done in (py)arrow, or should I just do it in numpy ?  
>  
> Thank you,  
> Hinko  
>