You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Samuel Jones (Jira)" <ji...@apache.org> on 2020/02/21 18:02:00 UTC
[jira] [Created] (ARROW-7914) Allow pandas datetime as index for
feather
Samuel Jones created ARROW-7914:
-----------------------------------
Summary: Allow pandas datetime as index for feather
Key: ARROW-7914
URL: https://issues.apache.org/jira/browse/ARROW-7914
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Affects Versions: 0.15.1
Environment: Windows, python 3.6.7,
Reporter: Samuel Jones
Attachments: PEC fine course 1 grid 199001.csv, PEC fine course 1 grid 199001.feather
Sorry in advance if I mess anything up. This is my first issue.
I have hourly data for 3 years using a Pandas datetime as the index. Pandas allows me load/save .csv with the following code (only one month with 2 variables shown):
`
h1. Write data to .csv
jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True)
h1. Load data from .csv
jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, parse_dates=True)
`
Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 21.6k+ columns (and more columns may be coming if I have to add lags to my data). So, a more efficient load/save routine is very desirable. I was excited when I found feather, but the lost index is a no-go for my use.
Thanks for your consideration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)