You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Ben Baumgold (Jira)" <ji...@apache.org> on 2022/02/23 16:26:00 UTC
[jira] [Created] (ARROW-15767) Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame

Ben Baumgold created ARROW-15767:
------------------------------------

             Summary: Arrow Table with Nullable DenseUnion fails to convert to Python Pandas DataFrame
                 Key: ARROW-15767
                 URL: https://issues.apache.org/jira/browse/ARROW-15767
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 6.0.1
            Reporter: Ben Baumgold
         Attachments: nothing.arrow

A feather file containing column of nullable values errors when converting to a Pandas DataFrame. It can be read into a pyarrow.Table as follows:
{code:python}
In [1]: import pyarrow.feather as feather

In [2]: t = feather.read_table("nothing.arrow")

In [3]: t
Out[3]:
pyarrow.Table
col: dense_union<: null=0, : int32 not null=1>
  child 0, : null
  child 1, : int32 not null
----
col: [  -- is_valid: all not null  -- type_ids:     [
      1,
      1,
      1,
      0
    ]  -- value_offsets:     [
      0,
      1,
      2,
      0
    ]  -- child 0 type: null
1 nulls  -- child 1 type: int32
    [
      1,
      2,
      3
    ]]
{code}
But when trying to convert the pyarrow.Table into a Pandas DataFrame, I get the following error:
{code:python}
In [4]: t.to_pandas()
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-25-8ba84762c39a> in <module>
----> 1 t.to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()

~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
    787     _check_data_column_metadata_consistency(all_columns)
    788     columns = _deserialize_column_index(table, all_columns, column_indexes)
--> 789     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
    790
    791     axes = [columns, index]

~/miniconda3/lib/python3.9/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
   1126     # Convert an arrow table to Block from the internal pandas API
   1127     columns = block_table.column_names
-> 1128     result = pa.lib.table_to_blocks(options, block_table, categories,
   1129                                     list(extension_columns.keys()))
   1130     return [_reconstruct_block(item, columns, extension_columns)

~/miniconda3/lib/python3.9/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()

~/miniconda3/lib/python3.9/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type dense_union<: null=0, : int32 not null=1> is known.
{code}
Note the Arrow file is valid and can be read successfully by [Arrow.jl|https://github.com/apache/arrow-julia]. A related issue is [arrow-julia#285|https://github.com/apache/arrow-julia/issues/285].  The  [^nothing.arrow]  file used in this example is attached for convenience.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)