You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Xiaozhen Liu <ja...@seu.edu.cn> on 2020/07/30 09:31:45 UTC

Error with Arrow Table and Pandas DataFrame conversion

Hi everyone,

I’m using pyarrow to convert an Arrow Table with a column whose type is List<Struct> to pandas.DataFrame, and this table is passed from Java to Python using Arrow Flight. It seems pyarrow has no problem converting this to a DataFrame, but errors when converting this DataFrame back to Arrow Table. The error I’m getting is ArrowTypeError. The Struct has 5 child types that are either Int or Utf8.

Why am I getting this kind of error when forward conversion (Arrow Table -> Pandas Dataframe) is successful? Is this a feature not implemented? And, how can I fix this?

Thank you.


Best,
Xiaozhen Liu

Re: Error with Arrow Table and Pandas DataFrame conversion

Posted by Wes McKinney <we...@gmail.com>.
Indeed it seems that structs are unhandled as items of lists that are
represented as ndarrays when coming from pandas

https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/python_to_arrow.cc#L759

Thanks for the report, I have filed
https://issues.apache.org/jira/browse/ARROW-9610

On Thu, Jul 30, 2020 at 6:20 PM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
>
> Hi,
>
>
>
> Sorry for not being clear.
>
> Pyarrow version is 0.17.1.
>
>
>
> Here is the full stacktree:
>
>
>
> Traceback (most recent call last):
>
>   File "tobacco_relevancy_classify.py", line 169, in do_action
>
>     output_data = pyarrow.Table.from_pandas(output_dataframe)
>
>   File "pyarrow\table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
>
>   File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 575, in dataframe_to_arrays
>
>     for c, f in zip(columns_to_convert, convert_fields)]
>
>   File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 575, in <listcomp>
>
>     for c, f in zip(columns_to_convert, convert_fields)]
>
>   File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 566, in convert_column
>
>     raise e
>
>   File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 560, in convert_column
>
>     result = pa.array(col, type=type_, from_pandas=True, safe=safe)
>
>   File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
>
>   File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
>
>   File "pyarrow\error.pxi", line 108, in pyarrow.lib.check_status
>
> pyarrow.lib.ArrowTypeError: ('Unknown list item type: struct<attributeName: string, end: int64, key: string, start: int64, tokenOffset: int64, value: string>', 'Conversion failed for column payload with type object')
>
>
>
> The column that causes this error has the following type:
>
>
>
> payload: list<Span: struct<attributeName: string, start: int32, end: int32, key: string, value: string, tokenOffset: int32>>
>
>   child 0, Span: struct<attributeName: string, start: int32, end: int32, key: string, value: string, tokenOffset: int32>
>
>       child 0, attributeName: string
>
>       child 1, start: int32
>
>       child 2, end: int32
>
>       child 3, key: string
>
>       child 4, value: string
>
>       child 5, tokenOffset: int32
>
>
>
> This column can be successfully converted to Dataframe, but cannot be converted back to Arrow Table.
>
>
>
> Thank you.
>
>
>
> Xiaozhen Liu
>
>
>
> From: Micah Kornfield
> Sent: Thursday, July 30, 2020 10:56 PM
> To: user@arrow.apache.org
> Subject: Re: Error with Arrow Table and Pandas DataFrame conversion
>
>
>
> Please include pyarrow version as well.
>
> On Thursday, July 30, 2020, Wes McKinney <we...@gmail.com> wrote:
>
> Could you provide more complete details about the error (an example if
> possible and the full error and stacktrace)?
>
> On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
> >
> > Hi everyone,
> >
> >
> >
> > I’m using pyarrow to convert an Arrow Table with a column whose type is List<Struct> to pandas.DataFrame, and this table is passed from Java to Python using Arrow Flight. It seems pyarrow has no problem converting this to a DataFrame, but errors when converting this DataFrame back to Arrow Table. The error I’m getting is ArrowTypeError. The Struct has 5 child types that are either Int or Utf8.
> >
> >
> >
> > Why am I getting this kind of error when forward conversion (Arrow Table -> Pandas Dataframe) is successful? Is this a feature not implemented? And, how can I fix this?
> >
> >
> >
> > Thank you.
> >
> >
> >
> >
> >
> > Best,
> >
> > Xiaozhen Liu
>
>

RE: Error with Arrow Table and Pandas DataFrame conversion

Posted by Xiaozhen Liu <ja...@seu.edu.cn>.
Hi,

Sorry for not being clear. 
Pyarrow version is 0.17.1. 

Here is the full stacktree:

Traceback (most recent call last):
  File "tobacco_relevancy_classify.py", line 169, in do_action
    output_data = pyarrow.Table.from_pandas(output_dataframe)
  File "pyarrow\table.pxi", line 1451, in pyarrow.lib.Table.from_pandas
  File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 575, in dataframe_to_arrays
    for c, f in zip(columns_to_convert, convert_fields)]
  File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 575, in <listcomp>
    for c, f in zip(columns_to_convert, convert_fields)]
  File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 566, in convert_column
    raise e
  File "C:\Users\Jamie\AppData\Local\Programs\Python\Python37\lib\site-packages\pyarrow\pandas_compat.py", line 560, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 108, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Unknown list item type: struct<attributeName: string, end: int64, key: string, start: int64, tokenOffset: int64, value: string>', 'Conversion failed for column payload with type object')

The column that causes this error has the following type:

payload: list<Span: struct<attributeName: string, start: int32, end: int32, key: string, value: string, tokenOffset: int32>>
  child 0, Span: struct<attributeName: string, start: int32, end: int32, key: string, value: string, tokenOffset: int32>
      child 0, attributeName: string
      child 1, start: int32
      child 2, end: int32
      child 3, key: string
      child 4, value: string
      child 5, tokenOffset: int32

This column can be successfully converted to Dataframe, but cannot be converted back to Arrow Table.

Thank you.

Xiaozhen Liu

From: Micah Kornfield
Sent: Thursday, July 30, 2020 10:56 PM
To: user@arrow.apache.org
Subject: Re: Error with Arrow Table and Pandas DataFrame conversion

Please include pyarrow version as well.

On Thursday, July 30, 2020, Wes McKinney <we...@gmail.com> wrote:
Could you provide more complete details about the error (an example if
possible and the full error and stacktrace)?

On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
>
> Hi everyone,
>
>
>
> I’m using pyarrow to convert an Arrow Table with a column whose type is List<Struct> to pandas.DataFrame, and this table is passed from Java to Python using Arrow Flight. It seems pyarrow has no problem converting this to a DataFrame, but errors when converting this DataFrame back to Arrow Table. The error I’m getting is ArrowTypeError. The Struct has 5 child types that are either Int or Utf8.
>
>
>
> Why am I getting this kind of error when forward conversion (Arrow Table -> Pandas Dataframe) is successful? Is this a feature not implemented? And, how can I fix this?
>
>
>
> Thank you.
>
>
>
>
>
> Best,
>
> Xiaozhen Liu


Re: Error with Arrow Table and Pandas DataFrame conversion

Posted by Micah Kornfield <em...@gmail.com>.
Please include pyarrow version as well.

On Thursday, July 30, 2020, Wes McKinney <we...@gmail.com> wrote:

> Could you provide more complete details about the error (an example if
> possible and the full error and stacktrace)?
>
> On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
> >
> > Hi everyone,
> >
> >
> >
> > I’m using pyarrow to convert an Arrow Table with a column whose type is
> List<Struct> to pandas.DataFrame, and this table is passed from Java to
> Python using Arrow Flight. It seems pyarrow has no problem converting this
> to a DataFrame, but errors when converting this DataFrame back to Arrow
> Table. The error I’m getting is ArrowTypeError. The Struct has 5 child
> types that are either Int or Utf8.
> >
> >
> >
> > Why am I getting this kind of error when forward conversion (Arrow Table
> -> Pandas Dataframe) is successful? Is this a feature not implemented? And,
> how can I fix this?
> >
> >
> >
> > Thank you.
> >
> >
> >
> >
> >
> > Best,
> >
> > Xiaozhen Liu
>

Re: Error with Arrow Table and Pandas DataFrame conversion

Posted by Wes McKinney <we...@gmail.com>.
Could you provide more complete details about the error (an example if
possible and the full error and stacktrace)?

On Thu, Jul 30, 2020 at 4:32 AM Xiaozhen Liu <ja...@seu.edu.cn> wrote:
>
> Hi everyone,
>
>
>
> I’m using pyarrow to convert an Arrow Table with a column whose type is List<Struct> to pandas.DataFrame, and this table is passed from Java to Python using Arrow Flight. It seems pyarrow has no problem converting this to a DataFrame, but errors when converting this DataFrame back to Arrow Table. The error I’m getting is ArrowTypeError. The Struct has 5 child types that are either Int or Utf8.
>
>
>
> Why am I getting this kind of error when forward conversion (Arrow Table -> Pandas Dataframe) is successful? Is this a feature not implemented? And, how can I fix this?
>
>
>
> Thank you.
>
>
>
>
>
> Best,
>
> Xiaozhen Liu