You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (JIRA)" <ji...@apache.org> on 2018/06/29 14:37:00 UTC

[jira] [Comment Edited] (ARROW-2646) [Python] Pandas roundtrip for date objects

    [ https://issues.apache.org/jira/browse/ARROW-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527728#comment-16527728 ] 

Antoine Pitrou edited comment on ARROW-2646 at 6/29/18 2:36 PM:
----------------------------------------------------------------

The standard {{csv}} module has both a notion of "dialect" and additional {{**kwargs}} to each function so that you can override individual options. Intuitively, it allows accepting individual option arguments without listing and documenting them explicitly for each method.

I tend to prefer the options object / dialect approach myself, but it's true I'm more in the library developer camp :-)


was (Author: pitrou):
The standard {{csv}} module has both a notion of "dialect" and addition {{**kwargs}} to each function to that you can override individual options. Intuitively, it allows accepting individual option arguments without listing and documenting them explicitly for each method.

> [Python] Pandas roundtrip for date objects
> ------------------------------------------
>
>                 Key: ARROW-2646
>                 URL: https://issues.apache.org/jira/browse/ARROW-2646
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.9.0
>            Reporter: Florian Jetter
>            Priority: Minor
>             Fix For: 0.10.0
>
>
> Arrow currently casts date objects to nanosecond precision datetime objects. I'd like to have a way to preserve the type during a roundtrip
> {code}
> >>> import pandas as pd
> >>> import pyarrow as pa
> >>> import datetime
> >>> pa.date32().to_pandas_dtype()
> dtype('<M8[ns]')
> >>> df = pd.DataFrame({'date': [datetime.date(2018, 1, 1)]})
> >>> df.dtypes
> date object
> dtype: object
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas()
> >>> df_rountrip.dtypes
> date    datetime64[ns]
> dtype: object
> {code}
> I'd expect something like this to work:
> {code}
> >>> import pandas.testing as pdt
> >>> df_rountrip = pa.Table.from_pandas(df).to_pandas(date_as_object=True)
> >>> pdt.assert_frame_equal(df_rountrip, df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)