You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Michael <mi...@gmail.com> on 2022/07/08 20:23:31 UTC

ExtensionArray Examples

I'm trying to create some ExtensionArrays in pandas and pyarrow but having
trouble figuring out the relationships between them.

I've taken a look at what they've been working on for the next release of
Pandas
<https://github.com/pandas-dev/pandas/tree/main/pandas/core/arrays/arrow>,
and while some of it is helpful, it's focused on supporting native pandas
types and providing them with arrow-backed arrays. I'd like to do something
similar but for scalar classes that are not part of pandas.

I think I need to create 4 different classes and some of the relevant
methods:

   - pandas ExtensionArray subclass
      - __arrow_array__
      - pandas ExtensionDtype subclass
   - pyarrow ExtensionArray subclass
   - pyarrow ExtensionType subclass
      - __arrow_ext_serialize__
      - __arrow_ext_deserialize__
      - __arrow_ext_class__
      - to_pandas_dtype

Is anybody aware of some good concrete examples of how to organize these
classes?

Thanks!

Best,
Michael

Re: ExtensionArray Examples

Posted by Michael <mi...@gmail.com>.
Ah thanks! It looks like the upcoming ExtensionScalar hooks
<https://github.com/apache/arrow/pull/13454> are exactly what I was looking
for. Very exciting!

Michael


On Fri, Jul 8, 2022 at 5:11 PM Rok Mihevc <ro...@gmail.com> wrote:

> Hey Michael,
>
>
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py
> might have the material you need.
>
> Rok
>
> On Fri, Jul 8, 2022 at 10:23 PM Michael <mi...@gmail.com>
> wrote:
>
>> I'm trying to create some ExtensionArrays in pandas and pyarrow but
>> having trouble figuring out the relationships between them.
>>
>> I've taken a look at what they've been working on for the next release
>> of Pandas
>> <https://github.com/pandas-dev/pandas/tree/main/pandas/core/arrays/arrow>,
>> and while some of it is helpful, it's focused on supporting native pandas
>> types and providing them with arrow-backed arrays. I'd like to do something
>> similar but for scalar classes that are not part of pandas.
>>
>> I think I need to create 4 different classes and some of the relevant
>> methods:
>>
>>    - pandas ExtensionArray subclass
>>       - __arrow_array__
>>       - pandas ExtensionDtype subclass
>>    - pyarrow ExtensionArray subclass
>>    - pyarrow ExtensionType subclass
>>       - __arrow_ext_serialize__
>>       - __arrow_ext_deserialize__
>>       - __arrow_ext_class__
>>       - to_pandas_dtype
>>
>> Is anybody aware of some good concrete examples of how to organize these
>> classes?
>>
>> Thanks!
>>
>> Best,
>> Michael
>>
>

Re: ExtensionArray Examples

Posted by Rok Mihevc <ro...@gmail.com>.
Hey Michael,

https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py
might have the material you need.

Rok

On Fri, Jul 8, 2022 at 10:23 PM Michael <mi...@gmail.com>
wrote:

> I'm trying to create some ExtensionArrays in pandas and pyarrow but having
> trouble figuring out the relationships between them.
>
> I've taken a look at what they've been working on for the next release of
> Pandas
> <https://github.com/pandas-dev/pandas/tree/main/pandas/core/arrays/arrow>,
> and while some of it is helpful, it's focused on supporting native pandas
> types and providing them with arrow-backed arrays. I'd like to do something
> similar but for scalar classes that are not part of pandas.
>
> I think I need to create 4 different classes and some of the relevant
> methods:
>
>    - pandas ExtensionArray subclass
>       - __arrow_array__
>       - pandas ExtensionDtype subclass
>    - pyarrow ExtensionArray subclass
>    - pyarrow ExtensionType subclass
>       - __arrow_ext_serialize__
>       - __arrow_ext_deserialize__
>       - __arrow_ext_class__
>       - to_pandas_dtype
>
> Is anybody aware of some good concrete examples of how to organize these
> classes?
>
> Thanks!
>
> Best,
> Michael
>