You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/05/20 10:05:00 UTC

[jira] [Closed] (ARROW-8868) [Python] Feather format cannot store/retrieve lists correctly?

     [ https://issues.apache.org/jira/browse/ARROW-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche closed ARROW-8868.
----------------------------------------
    Resolution: Duplicate

> [Python] Feather format cannot store/retrieve lists correctly?
> --------------------------------------------------------------
>
>                 Key: ARROW-8868
>                 URL: https://issues.apache.org/jira/browse/ARROW-8868
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.17.1
>         Environment: Python 3.8.2
> PyArrow 0.17.1
> Pandas 1.0.3
> Linux (Manjaro)
>            Reporter: Farzad Abdolhosseini
>            Priority: Major
>
> I'm seeing a very weird behavior when I try to store and retrieve a Pandas data-frame using the Feather format. Simplified example:
> {code:python}
> >>> import pandas as pd
> >>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
> >>> df
>  scalar array
> 0     1   [1]
> 1     2   [7]
> >>> df.to_feather("test.ft")
> >>> pd.read_feather("test.ft")
>   scalar                  array
> 0      1                   [16]
> 1      2  [1045468844972122628]
> {code}
> As you can see, the retrieved data is incorrect. I was originally trying to use the `feather-format` (not using Pandas directly) and that didn't work well either.
> By playing around with the data-frame that is to be stored I can also get different but still incorrect behavior, e.g. a larger list, an error that says the file size is incorrect, or simply a segmentation fault.
>  
> This is my first time using Feather/Arrow BTW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)