You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Ben Baumgold (Jira)" <ji...@apache.org> on 2022/02/23 16:26:00 UTC
[jira] [Created] (ARROW-15767) Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame
Ben Baumgold created ARROW-15767:
------------------------------------
Summary: Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame
Key: ARROW-15767
URL: https://issues.apache.org/jira/browse/ARROW-15767
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 6.0.1
Reporter: Ben Baumgold
Attachments: nothing.arrow
A feather file containing column of nullable values errors when converting to a Pandas DataFrame. It can be read into a pyarrow.Table as follows:
{code:python}
In [1]: import pyarrow.feather as feather
In [2]: t = feather.read_table("nothing.arrow")
In [3]: t
Out[3]:
pyarrow.Table
col: dense_union<: null=0, : int32 not null=1>
child 0, : null
child 1, : int32 not null
----
col: [ -- is_valid: all not null -- type_ids: [
1,
1,
1,
0
] -- value_offsets: [
0,
1,
2,
0
] -- child 0 type: null
1 nulls -- child 1 type: int32
[
1,
2,
3
]]
{code}
But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get the following error:
{code:python}
In [4]: t.to_pandas()
---------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
<ipython-input-25-8ba84762c39a> in <module>
----> 1 t.to_pandas()
~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
787 _check_data_column_metadata_consistency(all_columns)
788 columns = _deserialize_column_index(table, all_columns, column_indexes)
--> 789 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
790
791 axes = [columns, index]
~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
1126 # Convert an arrow table to Block from the internal pandas API
1127 columns = block_table.column_names
-> 1128 result = pa.lib.table_to_blocks(options, block_table, categories,
1129 list(extension_columns.keys()))
1130 return [_reconstruct_block(item, columns, extension_columns)
~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()
~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type dense_union<: null=0, : int32 not null=1> is known.
{code}
Note the Arrow file is valid and can be read successfully by [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285]. The [^nothing.arrow] file used in this example is attached for convenience.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)