You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Rok Mihevc (Jira)" <ji...@apache.org> on 2022/10/03 17:27:00 UTC

[jira] [Commented] (ARROW-17918) [Python] ExtensionArray.__getitem__ is not called if called from StructArray

    [ https://issues.apache.org/jira/browse/ARROW-17918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612385#comment-17612385 ] 

Rok Mihevc commented on ARROW-17918:
------------------------------------

[~jorisvandenbossche] what's your opinion on this?

> [Python] ExtensionArray.__getitem__ is not called if called from StructArray
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-17918
>                 URL: https://issues.apache.org/jira/browse/ARROW-17918
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Rok Mihevc
>            Priority: Major
>              Labels: pyarrow
>
> It seems that when getting a value from a StructScalar extension information is lost. See:
> {code:python}
> import pyarrow as pa
> class ExampleScalar(pa.ExtensionScalar):
>     def as_py(self):
>         print("ExampleScalar.as_py -> {self.value.as_py()}")
>         return self.value.as_py()
> class ExampleArray(pa.ExtensionArray):
>     def __getitem__(self, item):
>         return f"ExampleArray.__getitem__[{item}] -> {self.storage[item]}"
>     def __arrow_ext_scalar_class__(self):
>         return ExampleScalar
> class ExampleType(pa.ExtensionType):
>     def __init__(self):
>         pa.ExtensionType.__init__(self, pa.int64(), "ExampleExtensionType")
>     def __arrow_ext_serialize__(self):
>         return b""
>     def __arrow_ext_class__(self):
>         return ExampleArray
> example_type = ExampleType()
> arr = pa.array([1, 2, 3])
> example_array = pa.ExtensionArray.from_storage(example_type, arr)
> example_array2 = pa.StructArray.from_arrays([example_array, arr], ["a", "b"])
> print("\nExample 1\n=========")
> print(example_array[0])
> print(example_array.type)
> print(type(example_array[0]))
> print("\nExample 2\n=========")
> print(example_array2[0])
> print(example_array2[0].type)
> print(example_array2[0]["a"])
> print(example_array2[0]["a"].type)
> {code}
> Returns:
> {code:python}
> Example 1
> =========
> ExampleArray.__getitem__[0] -> 1
> extension<ExampleExtensionType<ExampleType>>
> <class 'str'>
> Example 2
> =========
> [('a', 1), ('b', 1)]
> struct<a: extension<ExampleExtensionType<ExampleType>>, b: int64>
> 1
> extension<ExampleExtensionType<ExampleType>>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)