You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Chang She (Jira)" <ji...@apache.org> on 2022/10/12 18:03:00 UTC

[jira] [Created] (ARROW-18013) Cannot concatenate extension arrays

Chang She created ARROW-18013:
---------------------------------

             Summary: Cannot concatenate extension arrays
                 Key: ARROW-18013
                 URL: https://issues.apache.org/jira/browse/ARROW-18013
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
    Affects Versions: 9.0.0
            Reporter: Chang She


`pa.Table.take` and `pa.ChunkedArray.combine_chunks` raises exception for extension arrays.

https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/src/arrow/array/concatenate.cc#L440

Quick example:
```
In [1]: import pyarrow as pa

In [2]: class LabelType(pa.ExtensionType):
   ...: 
   ...:          def __init__(self):
   ...:              super(LabelType, self).__init__(pa.string(), "label")
   ...: 
   ...:          def __arrow_ext_serialize__(self):
   ...:              return b""
   ...: 
   ...:          @classmethod
   ...:          def __arrow_ext_deserialize__(cls, storage_type, serialized):
   ...:              return LabelType()
   ...: 

In [3]: import numpy as np

In [4]: chunk1 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('a', 1000)))

In [5]: chunk2 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('b', 1000)))

In [6]: pa.chunked_array([chunk1, chunk2]).combine_chunks()
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In [6], line 1
----> 1 pa.chunked_array([chunk1, chunk2]).combine_chunks()

File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/table.pxi:700, in pyarrow.lib.ChunkedArray.combine_chunks()

File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:2889, in pyarrow.lib.concat_arrays()

File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status()

ArrowNotImplementedError: concatenation of extension<label<LabelType>>

```

Would it be possible to concatenate the storage and the "re-box" to the ExtensionType?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)