You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2022/06/24 19:32:00 UTC

[jira] [Assigned] (ARROW-16898) [Python] TypeError from `Table.from_pandas(df)` when df using non-str index name

     [ https://issues.apache.org/jira/browse/ARROW-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Li reassigned ARROW-16898:
--------------------------------

    Assignee: Martin Liu

> [Python] TypeError from `Table.from_pandas(df)` when df using non-str index name
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-16898
>                 URL: https://issues.apache.org/jira/browse/ARROW-16898
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Martin Liu
>            Assignee: Martin Liu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> When do {{{}Table.from_pandas(df){}}}, current code didn't convert {{index}} name to str (it did [convert {{column}} name to str|https://github.com/apache/arrow/blob/apache-arrow-8.0.0/python/pyarrow/pandas_compat.py#L356]), so that it will fail if *non-str index name* in df.
> Code to reproduce:
> {code:java}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({0: [1, 2, 3], 1: [4, 5, 6]})
> df = df.set_index(0)
> pa.Table.from_pandas(df) {code}
> Error:
> {code:java}
>  ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> Input In [3], in <module>
>       4 df = pd.DataFrame({0: [1, 2, 3], 1: [4, 5, 6]})
>       5 df = df.set_index(0)
> ----> 6 pa.Table.from_pandas(df)
> File ~/src/mlpsandboxrt/venv/lib/python3.8/site-packages/pyarrow/table.pxi:1394, in pyarrow.lib.Table.from_pandas()
> File ~/src/mlpsandboxrt/venv/lib/python3.8/site-packages/pyarrow/pandas_compat.py:610, in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
>     608     for name, type_ in zip(all_names, types):
>     609         name = name if name is not None else 'None'
> --> 610         fields.append(pa.field(name, type_))
>     611     schema = pa.schema(fields)
>     613 pandas_metadata = construct_metadata(df, column_names, index_columns,
>     614                                      index_descriptors, preserve_index,
>     615                                      types)
> File ~/src/mlpsandboxrt/venv/lib/python3.8/site-packages/pyarrow/types.pxi:1698, in pyarrow.lib.field()
> File stringsource:15, in string.from_py.__pyx_convert_string_from_py_std__in_string()
> TypeError: expected bytes, int found{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)