You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/01/07 23:43:00 UTC

[jira] [Comment Edited] (ARROW-10523) [Python] Pandas timestamps are inferred to have only microsecond precision

    [ https://issues.apache.org/jira/browse/ARROW-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260884#comment-17260884 ] 

Weston Pace edited comment on ARROW-10523 at 1/7/21, 11:42 PM:
---------------------------------------------------------------

Arrow can handle ns datetimes to/from pandas.  I think the issue here is you are passing in a python list object (as opposed to a numpy ndarray or a pandas series) to pyarrow and so it is assuming they are python objects and sees it as a python datetime.

Try the following...
{code:java}
import pyarrow as pa
import pandas as pd
arr = pa.array(pd.Series([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)]))
print(arr)
print(arr.type){code}


was (Author: westonpace):
Arrow can handle ns datetimes to/from pandas.  I think the issue here is you are passing in a python list object (as opposed to a numpy ndarray or a pandas series) to pyarrow and so it is assuming they are python objects and sees it as a python datetime.

Try the following...
{code:java}
import pyarrow as paimport pandas as pd
arr = pa.array(pd.Series([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)]))
print(arr)
print(arr.type){code}

> [Python] Pandas timestamps are inferred to have only microsecond precision
> --------------------------------------------------------------------------
>
>                 Key: ARROW-10523
>                 URL: https://issues.apache.org/jira/browse/ARROW-10523
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 2.0.0
>            Reporter: David Li
>            Priority: Minor
>
> {code:java}
> import pyarrow as pa
> import pandas as pd
> arr = pa.array([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)])
> print(arr)
> print(arr.type) {code}
> This gives:
> {noformat}
> [
>   2020-01-01 00:00:00.000000
> ]
> timestamp[us]
> {noformat}
> However, Pandas Timestamps have nanosecond precision, which would be nice to preserve in inference.
> The reason is that TypeInferrer [hardcodes microseconds|https://github.com/apache/arrow/blob/apache-arrow-2.0.0/cpp/src/arrow/python/inference.cc#L466] as it only knows about the standard library datetime, so I'm treating this as a feature request and not quite a bug. Of course, this can be worked around easily by specifying an explicit type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)