You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mike (Jira)" <ji...@apache.org> on 2022/06/01 16:57:00 UTC

[jira] [Commented] (ARROW-7914) [Python] Allow pandas datetime as index for feather

    [ https://issues.apache.org/jira/browse/ARROW-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545025#comment-17545025 ] 

Mike commented on ARROW-7914:
-----------------------------

[~jorisvandenbossche] I tested locally and looks like `freq` is not being preserved. Here's an example:
{code:python}
from pyarrow import feather
import io
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3]})
df.index = pd.date_range("20130101", periods=3)
stream = io.BytesIO()
feather.write_feather(df=df, dest=stream)
f_df = feather.read_feather(stream)

>>> df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', freq='D')
>>> f_df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', freq=None)
>>> df.index.freq
<Day>
>>> f_df.index.freq
>>> 
{code}

> [Python] Allow pandas datetime as index for feather
> ---------------------------------------------------
>
>                 Key: ARROW-7914
>                 URL: https://issues.apache.org/jira/browse/ARROW-7914
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>    Affects Versions: 0.15.1
>         Environment: Windows, python 3.6.7,
>            Reporter: Samuel Jones
>            Assignee: saloni jain
>            Priority: Minor
>              Labels: arrow, datetime, feather, pull-request-available, python
>             Fix For: 8.0.0
>
>         Attachments: PEC fine course 1 grid 199001.csv, PEC fine course 1 grid 199001.feather
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sorry in advance if I mess anything up. This is my first issue.
> I have hourly data for 3 years using a  Pandas datetime as the index. Pandas allows me load/save .csv with the following code (only one month with 2 variables shown):
> `
> h1. Write data to .csv
> jan90.to_csv('PEC fine course 1 grid 199001.csv', index=True)
> h1. Load data from .csv
> jan90 = pd.read_csv('PEC fine course 1 grid 199001.csv', index_col=0, parse_dates=True)
> `
> Using .csv works, but is slow when I get to the full dataset of 26k+ rows and 21.6k+ columns (and more columns may be coming if I have to add lags to my data). So, a more efficient load/save routine is very desirable. I was excited when I found feather, but the lost index is a no-go for my use.
> Thanks for your consideration.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)