You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Chang She (Jira)" <ji...@apache.org> on 2022/10/12 18:03:00 UTC
[jira] [Created] (ARROW-18013) Cannot concatenate extension arrays
Chang She created ARROW-18013:
---------------------------------
Summary: Cannot concatenate extension arrays
Key: ARROW-18013
URL: https://issues.apache.org/jira/browse/ARROW-18013
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Python
Affects Versions: 9.0.0
Reporter: Chang She
`pa.Table.take` and `pa.ChunkedArray.combine_chunks` raises exception for extension arrays.
https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/src/arrow/array/concatenate.cc#L440
Quick example:
```
In [1]: import pyarrow as pa
In [2]: class LabelType(pa.ExtensionType):
...:
...: def __init__(self):
...: super(LabelType, self).__init__(pa.string(), "label")
...:
...: def __arrow_ext_serialize__(self):
...: return b""
...:
...: @classmethod
...: def __arrow_ext_deserialize__(cls, storage_type, serialized):
...: return LabelType()
...:
In [3]: import numpy as np
In [4]: chunk1 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('a', 1000)))
In [5]: chunk2 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('b', 1000)))
In [6]: pa.chunked_array([chunk1, chunk2]).combine_chunks()
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
Cell In [6], line 1
----> 1 pa.chunked_array([chunk1, chunk2]).combine_chunks()
File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/table.pxi:700, in pyarrow.lib.ChunkedArray.combine_chunks()
File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:2889, in pyarrow.lib.concat_arrays()
File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()
ArrowNotImplementedError: concatenation of extension<label<LabelType>>
```
Would it be possible to concatenate the storage and the "re-box" to the ExtensionType?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)