You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2020/10/20 01:46:38 UTC

Re: Arrow C Data Interface

hi Pasha,

Copying dev@.

You can see how DuckDB interacts with the pyarrow data structures by
the C interface here, maybe it's helpful

https://github.com/cwida/duckdb/blob/master/tools/pythonpkg/duckdb_python.cpp

We haven't defined a Python API (either C API level or Python API
level) so that objects can advertise that they support the Arrow C
interface -- it's a separate issue from the C interface itself (which
doesn't have anything specifically to do with Python), and I agree it
would probably be a good idea to have a standard way that we codify
and document .

Thanks
Wes

On Mon, Oct 19, 2020 at 12:34 PM Pasha Stetsenko <st...@gmail.com> wrote:
>
> Hi everybody,
>
> I've been reading http://arrow.apache.org/docs/format/CDataInterface.html, which has been
> "... inspired by the Python buffer protocol", and i can't find any details on how to connect this
> protocol with other libraries/applications.
>
> Here's what I mean: with the python buffer protocol, i can create a new type and set its
> `tp_as_buffer` field to a `PyBufferProcs` structure. This way any other library can call
> `PyObject_CheckBuffer()` on my object to check whether or not it supports the buffer interface,
> and then `PyObject_GetBuffer()` to use that interface.
>
> I could not find the corresponding mechanisms in the Arrow C data interface. For example, consider the "Exporting a simple int32 array" tutorial in the article above. After creating
> `export_int32_type()`, `release_int32_type()`, `export_int32_array()`, `release_int32_array()`
> -- how do i announce to the world that these functions are available? Conversely, if i want to
> talk to an Arrow Table via this interface -- where do i find the endpoints that return
> `ArrowSchema` and `ArrowArray` structures?
>
> (I understand that there is an additional, more complicated API for accessing arrow objects http://arrow.apache.org/docs/python/extending.html, but this seems to be a completely different
> API than what CDataInterface describes).

Re: Arrow C Data Interface

Posted by Antoine Pitrou <an...@python.org>.
Hi Pasha,

It would be helpful to know in which broader context you're asking.  Are
you trying to do something in particular?

> i can't find any details on how to connect this
> protocol with other libraries/applications.

You use those libraries/applications' dedicated APIs.

Just like in Python, when a library's API says "you can pass any object
defining the buffer protocol for argument XXX", you can do just that.

The C data interface is not an *API*.  It defines a standard for
exchanging data.  How that data is exposed or consumed is up to
third-parties.

Regards

Antoine.




Le 20/10/2020 à 03:46, Wes McKinney a écrit :
> hi Pasha,
> 
> Copying dev@.
> 
> You can see how DuckDB interacts with the pyarrow data structures by
> the C interface here, maybe it's helpful
> 
> https://github.com/cwida/duckdb/blob/master/tools/pythonpkg/duckdb_python.cpp
> 
> We haven't defined a Python API (either C API level or Python API
> level) so that objects can advertise that they support the Arrow C
> interface -- it's a separate issue from the C interface itself (which
> doesn't have anything specifically to do with Python), and I agree it
> would probably be a good idea to have a standard way that we codify
> and document .
> 
> Thanks
> Wes
> 
> On Mon, Oct 19, 2020 at 12:34 PM Pasha Stetsenko <st...@gmail.com> wrote:
>>
>> Hi everybody,
>>
>> I've been reading http://arrow.apache.org/docs/format/CDataInterface.html, which has been
>> "... inspired by the Python buffer protocol", and i can't find any details on how to connect this
>> protocol with other libraries/applications.
>>
>> Here's what I mean: with the python buffer protocol, i can create a new type and set its
>> `tp_as_buffer` field to a `PyBufferProcs` structure. This way any other library can call
>> `PyObject_CheckBuffer()` on my object to check whether or not it supports the buffer interface,
>> and then `PyObject_GetBuffer()` to use that interface.
>>
>> I could not find the corresponding mechanisms in the Arrow C data interface. For example, consider the "Exporting a simple int32 array" tutorial in the article above. After creating
>> `export_int32_type()`, `release_int32_type()`, `export_int32_array()`, `release_int32_array()`
>> -- how do i announce to the world that these functions are available? Conversely, if i want to
>> talk to an Arrow Table via this interface -- where do i find the endpoints that return
>> `ArrowSchema` and `ArrowArray` structures?
>>
>> (I understand that there is an additional, more complicated API for accessing arrow objects http://arrow.apache.org/docs/python/extending.html, but this seems to be a completely different
>> API than what CDataInterface describes).

Re: Arrow C Data Interface

Posted by Antoine Pitrou <an...@python.org>.
Hi Pasha,

It would be helpful to know in which broader context you're asking.  Are
you trying to do something in particular?

> i can't find any details on how to connect this
> protocol with other libraries/applications.

You use those libraries/applications' dedicated APIs.

Just like in Python, when a library's API says "you can pass any object
defining the buffer protocol for argument XXX", you can do just that.

The C data interface is not an *API*.  It defines a standard for
exchanging data.  How that data is exposed or consumed is up to
third-parties.

Regards

Antoine.




Le 20/10/2020 à 03:46, Wes McKinney a écrit :
> hi Pasha,
> 
> Copying dev@.
> 
> You can see how DuckDB interacts with the pyarrow data structures by
> the C interface here, maybe it's helpful
> 
> https://github.com/cwida/duckdb/blob/master/tools/pythonpkg/duckdb_python.cpp
> 
> We haven't defined a Python API (either C API level or Python API
> level) so that objects can advertise that they support the Arrow C
> interface -- it's a separate issue from the C interface itself (which
> doesn't have anything specifically to do with Python), and I agree it
> would probably be a good idea to have a standard way that we codify
> and document .
> 
> Thanks
> Wes
> 
> On Mon, Oct 19, 2020 at 12:34 PM Pasha Stetsenko <st...@gmail.com> wrote:
>>
>> Hi everybody,
>>
>> I've been reading http://arrow.apache.org/docs/format/CDataInterface.html, which has been
>> "... inspired by the Python buffer protocol", and i can't find any details on how to connect this
>> protocol with other libraries/applications.
>>
>> Here's what I mean: with the python buffer protocol, i can create a new type and set its
>> `tp_as_buffer` field to a `PyBufferProcs` structure. This way any other library can call
>> `PyObject_CheckBuffer()` on my object to check whether or not it supports the buffer interface,
>> and then `PyObject_GetBuffer()` to use that interface.
>>
>> I could not find the corresponding mechanisms in the Arrow C data interface. For example, consider the "Exporting a simple int32 array" tutorial in the article above. After creating
>> `export_int32_type()`, `release_int32_type()`, `export_int32_array()`, `release_int32_array()`
>> -- how do i announce to the world that these functions are available? Conversely, if i want to
>> talk to an Arrow Table via this interface -- where do i find the endpoints that return
>> `ArrowSchema` and `ArrowArray` structures?
>>
>> (I understand that there is an additional, more complicated API for accessing arrow objects http://arrow.apache.org/docs/python/extending.html, but this seems to be a completely different
>> API than what CDataInterface describes).