You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kouhei Sutou (Jira)" <ji...@apache.org> on 2022/10/12 21:11:00 UTC
[jira] [Updated] (ARROW-18013) [C++][Python] Cannot concatenate extension arrays
[ https://issues.apache.org/jira/browse/ARROW-18013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kouhei Sutou updated ARROW-18013:
---------------------------------
Summary: [C++][Python] Cannot concatenate extension arrays (was: [Python] Cannot concatenate extension arrays)
> [C++][Python] Cannot concatenate extension arrays
> -------------------------------------------------
>
> Key: ARROW-18013
> URL: https://issues.apache.org/jira/browse/ARROW-18013
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python
> Affects Versions: 9.0.0
> Reporter: Chang She
> Priority: Major
>
> `pa.Table.take` and `pa.ChunkedArray.combine_chunks` raises exception for extension arrays.
> https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/src/arrow/array/concatenate.cc#L440
> Quick example:
> ```
> In [1]: import pyarrow as pa
> In [2]: class LabelType(pa.ExtensionType):
> ...:
> ...: def __init__(self):
> ...: super(LabelType, self).__init__(pa.string(), "label")
> ...:
> ...: def __arrow_ext_serialize__(self):
> ...: return b""
> ...:
> ...: @classmethod
> ...: def __arrow_ext_deserialize__(cls, storage_type, serialized):
> ...: return LabelType()
> ...:
> In [3]: import numpy as np
> In [4]: chunk1 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('a', 1000)))
> In [5]: chunk2 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('b', 1000)))
> In [6]: pa.chunked_array([chunk1, chunk2]).combine_chunks()
> ---------------------------------------------------------------------------
> ArrowNotImplementedError Traceback (most recent call last)
> Cell In [6], line 1
> ----> 1 pa.chunked_array([chunk1, chunk2]).combine_chunks()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/table.pxi:700, in pyarrow.lib.ChunkedArray.combine_chunks()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:2889, in pyarrow.lib.concat_arrays()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
> File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()
> ArrowNotImplementedError: concatenation of extension<label<LabelType>>
> ```
> Would it be possible to concatenate the storage and the "re-box" to the ExtensionType?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)