You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/31 13:08:00 UTC
[jira] [Updated] (ARROW-15253) [Python] Error in to_pandas for empty dataframe with pd.interval_range index
[ https://issues.apache.org/jira/browse/ARROW-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-15253:
-----------------------------------
Labels: conversion pandas pull-request-available (was: conversion pandas)
> [Python] Error in to_pandas for empty dataframe with pd.interval_range index
> ----------------------------------------------------------------------------
>
> Key: ARROW-15253
> URL: https://issues.apache.org/jira/browse/ARROW-15253
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Alenka Frim
> Assignee: Alenka Frim
> Priority: Major
> Labels: conversion, pandas, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In __table_to_blocks_ ([pandas_compat.py|https://github.com/apache/arrow/blob/08096d4125fcbfe43ecf48614a15f1205cd4e8f3/python/pyarrow/pandas_compat.py#L1130-L1138]) the input _extension_columns_ is equal to {None: interval[int64, right]} for _pd.interval_range_ and so an error is triggered as None can not be encoded. Same happens for _pd.PeriodIndex_.
> Example:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame(index=pd.interval_range(start=0, end=5))
> table = pa.table(df)
> table.to_pandas()
> {code}
> Error:
> {code:java}
> TypeError Traceback (most recent call last)
> /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/ipykernel_13963/1439451337.py in <module>
> 1 df5 = pd.DataFrame(index=pd.PeriodIndex(year=[2000, 2002], quarter=[1, 3]))
> 2 table5 = pa.table(df5)
> ----> 3 table5.to_pandas().shape
> ~/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
> 764 self_destruct=self_destruct
> 765 )
> --> 766 return self._to_pandas(options, categories=categories,
> 767 ignore_metadata=ignore_metadata,
> 768 types_mapper=types_mapper)
> ~/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
> 1819 types_mapper=None):
> 1820 from pyarrow.pandas_compat import table_to_blockmanager
> -> 1821 mgr = table_to_blockmanager(
> 1822 options, self, categories,
> 1823 ignore_metadata=ignore_metadata,
> ~/repos/arrow/python/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
> 787 _check_data_column_metadata_consistency(all_columns)
> 788 columns = _deserialize_column_index(table, all_columns, column_indexes)
> --> 789 blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
> 790
> 791 axes = [columns, index]
> ~/repos/arrow/python/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
> 1133 # Convert an arrow table to Block from the internal pandas API
> 1134 columns = block_table.column_names
> -> 1135 result = pa.lib.table_to_blocks(options, block_table, categories,
> 1136 list(extension_columns.keys()))
> 1137 return [_reconstruct_block(item, columns, extension_columns)
> ~/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()
> 1215 c_options.categorical_columns = {tobytes(cat) for cat in categories}
> 1216 if extension_columns is not None:
> -> 1217 c_options.extension_columns = {tobytes(col)
> 1218 for col in extension_columns}
> 1219
> ~/repos/arrow/python/pyarrow/lib.cpython-39-darwin.so in set.from_py.__pyx_convert_unordered_set_from_py_std_3a__3a_string()
> ~/repos/arrow/python/pyarrow/lib.cpython-39-darwin.so in string.from_py.__pyx_convert_string_from_py_std__in_string()
> TypeError: expected bytes, NoneType found
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)