You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Razvan Chitu (Jira)" <ji...@apache.org> on 2019/10/16 08:37:00 UTC
[jira] [Created] (ARROW-6899) to_pandas() not implemented on list

Razvan Chitu created ARROW-6899:
-----------------------------------

             Summary: to_pandas() not implemented on list<dictionary<values=string, indices=int32>
                 Key: ARROW-6899
                 URL: https://issues.apache.org/jira/browse/ARROW-6899
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.15.0, 0.13.0
            Reporter: Razvan Chitu
         Attachments: encoded.arrow

Hi,

{{pyarrow.Table.to_pandas()}} fails on an Arrow List Vector where the data vector is of type "dictionary encoded string". Here is the table schema as printed by pyarrow:
{code:java}
pyarrow.Table
encodedList: list<$data$: dictionary<values=string, indices=int32, ordered=0> not null> not null
  child 0, $data$: dictionary<values=string, indices=int32, ordered=0> not null
metadata
--------
OrderedDict() {code}
and the data (also attached in a file to this ticket)
{code:java}
<pyarrow.lib.ChunkedArray object at 0x7f7ea6a748b8>
[
  [

    -- dictionary:
      [
        "a",
        "b",
        "c",
        "d"
      ]
    -- indices:
      [
        0,
        1,
        2
      ],

    -- dictionary:
      [
        "a",
        "b",
        "c",
        "d"
      ]
    -- indices:
      [
        0,
        3
      ]
  ]
] {code}
and the exception I got
{code:java}
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-10-5f865bc01df1> in <module>
----> 1 df.to_pandas()

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata)
    700 
    701     _check_data_column_metadata_consistency(all_columns)
--> 702     blocks = _table_to_blocks(options, table, categories)
    703     columns = _deserialize_column_index(table, all_columns, column_indexes)
    704 

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories)
    972 
    973     # Convert an arrow table to Block from the internal pandas API
--> 974     result = pa.lib.table_to_blocks(options, block_table, categories)
    975 
    976     # Defined above

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()

~/.local/share/virtualenvs/jupyter-BKbz0SEp/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: dictionary<values=string, indices=int32, ordered=0> {code}
Note that the data vector itself can be loaded successfully by to_pandas.

It'd be great if this would be addressed in the next version of pyarrow. For now, is there anything I can do on my end to bypass this unimplemented conversion?

Thanks,

Razvan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)