You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Samuel Jones (Jira)" <ji...@apache.org> on 2020/02/21 18:02:00 UTC

[jira] [Created] (ARROW-7914) Allow pandas datetime as index for feather

Samuel Jones created ARROW-7914:
-----------------------------------

             Summary: Allow pandas datetime as index for feather
                 Key: ARROW-7914
                 URL: https://issues.apache.org/jira/browse/ARROW-7914
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.15.1
         Environment: Windows, python 3.6.7,
            Reporter: Samuel Jones
         Attachments: PEC fine course 1 grid 199001.csv, PEC fine course 1 grid 199001.feather

Sorry in advance if I mess anything up. This is my first issue.

I have hourly data for 3 years using a  Pandas datetime as the index. Pandas allows me load/save .csv with the following code (only one month with 2 variables shown):
`
h1. Write data to .csv

jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True)
h1. Load data from .csv

jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, parse_dates=True)
`
Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 21.6k+ columns (and more columns may be coming if I have to add lags to my data). So, a more efficient load/save routine is very desirable. I was excited when I found feather, but the lost index is a no-go for my use.

Thanks for your consideration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)