You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Farzad Abdolhosseini (Jira)" <ji...@apache.org> on 2020/05/19 22:37:00 UTC

[jira] [Created] (ARROW-8868) [Python] Feather format cannot store/retrieve lists correctly?

Farzad Abdolhosseini created ARROW-8868:
-------------------------------------------

             Summary: [Python] Feather format cannot store/retrieve lists correctly?
                 Key: ARROW-8868
                 URL: https://issues.apache.org/jira/browse/ARROW-8868
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.17.1
         Environment: Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
            Reporter: Farzad Abdolhosseini


I'm seeing a very weird behavior when I try to store and retrieve a Pandas data-frame using the Feather format. Simplified example:
{code:python}
>>> import pandas as pd
>>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
>>> df
 scalar array
0     1   [1]
1     2   [7]
>>> df.to_feather("test.ft")
>>> pd.read_feather("test.ft")
  scalar                  array
0      1                   [16]
1      2  [1045468844972122628]
{code}
As you can see, the retrieved data is incorrect. I was originally trying to use the `feather-format` (not using Pandas directly) and that didn't work well either.

By playing around with the data-frame that is to be stored I can also get different but still incorrect behavior, e.g. a larger list, an error that says the file size is incorrect, or simply a segmentation fault.

 

This is my first time using Feather/Arrow BTW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)