You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/04/19 10:23:00 UTC
[jira] [Created] (ARROW-16231) [C++][Python] IPC failure for dictionary with extension type with struct storage type
Joris Van den Bossche created ARROW-16231:
---------------------------------------------
Summary: [C++][Python] IPC failure for dictionary with extension type with struct storage type
Key: ARROW-16231
URL: https://issues.apache.org/jira/browse/ARROW-16231
Project: Apache Arrow
Issue Type: Improvement
Components: C++, Python
Reporter: Joris Van den Bossche
Report from [https://github.com/apache/arrow/issues/12899]
Roundtripping through IPC/Feather using a dictionary type where the dictionary is an extension type with a nested storage type fails. Writing seems to work (but no idea if the written file is "correct", as trying to read the schema gives an error), but reading it back fails with {_}"ArrowInvalid: Ran out of field metadata, likely malformed"{_}.
The original use case was from a pandas extension type (the pandas interval dtype is mapped to an arrow extension type with a struct type as storage, and in this case this interval type was further wrapped in a categorical (dictionary) type). A pandas-based test that reproduces this (can be added like this in {{{}test_feather.py{}}}):
{code:python}
@pytest.mark.pandas
def test_dictionary_interval():
df = pd.DataFrame({'a': pd.cut(range(1, 10, 3), [-1, 5, 10])})
_check_pandas_roundtrip(df, version=2)
{code}
this gives:
{code:java}
$ pytest python/pyarrow/tests/test_feather.py::test_dictionary_interval
....
========================= FAILURES =================
____________ test_dictionary_interval _______________
pyarrow/_feather.pyx:88: in pyarrow._feather.FeatherReader.read
E pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed
E ../src/arrow/ipc/reader.cc:266 GetFieldMetadata(field_index_++, out_)
E ../src/arrow/ipc/reader.cc:283 LoadCommon(type_id)
E ../src/arrow/ipc/reader.cc:324 Load(child_fields[i].get(), parent->child_data[i].get())
E ../src/arrow/ipc/reader.cc:529 loader.Load(&field, column.get())
E ../src/arrow/ipc/reader.cc:1188 ReadRecordBatchInternal( *message->metadata(), schema_, field_inclusion_mask_, context, reader.get())
E ../src/arrow/ipc/feather.cc:730 reader->ReadRecordBatch(i)
pyarrow/error.pxi:100: ArrowInvalid
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)