You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kouhei Sutou (Jira)" <ji...@apache.org> on 2022/10/12 21:11:00 UTC

[jira] [Updated] (ARROW-18013) [C++][Python] Cannot concatenate extension arrays

     [ https://issues.apache.org/jira/browse/ARROW-18013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kouhei Sutou updated ARROW-18013:
---------------------------------
    Summary: [C++][Python] Cannot concatenate extension arrays  (was: [Python] Cannot concatenate extension arrays)

> [C++][Python] Cannot concatenate extension arrays
> -------------------------------------------------
>
>                 Key: ARROW-18013
>                 URL: https://issues.apache.org/jira/browse/ARROW-18013
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 9.0.0
>            Reporter: Chang She
>            Priority: Major
>
> `pa.Table.take` and `pa.ChunkedArray.combine_chunks` raises exception for extension arrays.
> https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/src/arrow/array/concatenate.cc#L440
> Quick example:
> ```
> In [1]: import pyarrow as pa
> In [2]: class LabelType(pa.ExtensionType):
>    ...: 
>    ...:          def __init__(self):
>    ...:              super(LabelType, self).__init__(pa.string(), "label")
>    ...: 
>    ...:          def __arrow_ext_serialize__(self):
>    ...:              return b""
>    ...: 
>    ...:          @classmethod
>    ...:          def __arrow_ext_deserialize__(cls, storage_type, serialized):
>    ...:              return LabelType()
>    ...: 
> In [3]: import numpy as np
> In [4]: chunk1 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('a', 1000)))
> In [5]: chunk2 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('b', 1000)))
> In [6]: pa.chunked_array([chunk1, chunk2]).combine_chunks()
> ---------------------------------------------------------------------------
> ArrowNotImplementedError                  Traceback (most recent call last)
> Cell In [6], line 1
> ----> 1 pa.chunked_array([chunk1, chunk2]).combine_chunks()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/table.pxi:700, in pyarrow.lib.ChunkedArray.combine_chunks()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:2889, in pyarrow.lib.concat_arrays()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()
> ArrowNotImplementedError: concatenation of extension<label<LabelType>>
> ```
> Would it be possible to concatenate the storage and the "re-box" to the ExtensionType?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)