You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Abdul Rahman (JIRA)" <ji...@apache.org> on 2017/05/28 02:39:04 UTC

[jira] [Updated] (ARROW-1074) from_pandas doesnt convert ndarray to list

     [ https://issues.apache.org/jira/browse/ARROW-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abdul Rahman updated ARROW-1074:
--------------------------------
    Description: 
[Feel free to change issue type because this is probably by design]

I have noticed that that if the one of the columns in the parquet file is of type array, pyarrow table stores it as list
>>> table[3].type
DataType(list<element: string>)
If I do a .to_pandas() on the column, I get something like this
>> table[3].to_pandas()
0         None                                                                                                     
1          [7]                                                                                                     
2         [46]  
dtype: object

However, I cant do a pyarrow.Table.from_pandas from a dataframe having the above ndarray as a series/column. I get this error
Invalid: Python object of type ndarray is not None and is not a string, bool, float, int, date,
decimal object

If to_pandas() can covert a list to ndarray, shouldnt from_pandas also convert an ndarray to type list in the table ?

  was:
[Feel free to change issue type because this is probably by design]

I have noticed that that if the one of the columns in the parquet file is of type array, pyarrow table stores it as list
>>> table[3].type
DataType(list<element: string>)
If I do a .to_pandas() on the column, I get something like this
0         None                                                                                                     
1          [7]                                                                                                     
2         [46]  
dtype: object

However, I cant do a pyarrow.Table.from_pandas from a dataframe having the above ndarray as a series/column. I get this error
Invalid: Python object of type ndarray is not None and is not a string, bool, float, int, date,
decimal object

If to_pandas() can covert a list to ndarray, shouldnt from_pandas also convert an ndarray to type list in the table ?


> from_pandas doesnt convert ndarray to list
> ------------------------------------------
>
>                 Key: ARROW-1074
>                 URL: https://issues.apache.org/jira/browse/ARROW-1074
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.4.0
>            Reporter: Abdul Rahman
>            Priority: Minor
>              Labels: pyarrow
>
> [Feel free to change issue type because this is probably by design]
> I have noticed that that if the one of the columns in the parquet file is of type array, pyarrow table stores it as list
> >>> table[3].type
> DataType(list<element: string>)
> If I do a .to_pandas() on the column, I get something like this
> >> table[3].to_pandas()
> 0         None                                                                                                     
> 1          [7]                                                                                                     
> 2         [46]  
> dtype: object
> However, I cant do a pyarrow.Table.from_pandas from a dataframe having the above ndarray as a series/column. I get this error
> Invalid: Python object of type ndarray is not None and is not a string, bool, float, int, date,
> decimal object
> If to_pandas() can covert a list to ndarray, shouldnt from_pandas also convert an ndarray to type list in the table ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)