You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/04/19 10:23:00 UTC

[jira] [Created] (ARROW-16231) [C++][Python] IPC failure for dictionary with extension type with struct storage type

Joris Van den Bossche created ARROW-16231:
---------------------------------------------

             Summary: [C++][Python] IPC failure for dictionary with extension type with struct storage type
                 Key: ARROW-16231
                 URL: https://issues.apache.org/jira/browse/ARROW-16231
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
            Reporter: Joris Van den Bossche


Report from [https://github.com/apache/arrow/issues/12899]

Roundtripping through IPC/Feather using a dictionary type where the dictionary is an extension type with a nested storage type fails. Writing seems to work (but no idea if the written file is "correct", as trying to read the schema gives an error), but reading it back fails with {_}"ArrowInvalid: Ran out of field metadata, likely malformed"{_}.

The original use case was from a pandas extension type (the pandas interval dtype is mapped to an arrow extension type with a struct type as storage, and in this case this interval type was further wrapped in a categorical (dictionary) type). A pandas-based test that reproduces this (can be added like this in {{{}test_feather.py{}}}):
{code:python}
@pytest.mark.pandas
def test_dictionary_interval():
    df = pd.DataFrame({'a': pd.cut(range(1, 10, 3), [-1, 5, 10])})
    _check_pandas_roundtrip(df, version=2)
{code}
this gives:
{code:java}
$ pytest python/pyarrow/tests/test_feather.py::test_dictionary_interval
....
========================= FAILURES =================
____________ test_dictionary_interval _______________

pyarrow/_feather.pyx:88: in pyarrow._feather.FeatherReader.read

E   pyarrow.lib.ArrowInvalid: Ran out of field metadata, likely malformed
E   ../src/arrow/ipc/reader.cc:266  GetFieldMetadata(field_index_++, out_)
E   ../src/arrow/ipc/reader.cc:283  LoadCommon(type_id)
E   ../src/arrow/ipc/reader.cc:324  Load(child_fields[i].get(), parent->child_data[i].get())
E   ../src/arrow/ipc/reader.cc:529  loader.Load(&field, column.get())
E   ../src/arrow/ipc/reader.cc:1188  ReadRecordBatchInternal( *message->metadata(), schema_, field_inclusion_mask_, context, reader.get())
E   ../src/arrow/ipc/feather.cc:730  reader->ReadRecordBatch(i)

pyarrow/error.pxi:100: ArrowInvalid
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)