You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2020/11/09 17:03:00 UTC

[jira] [Created] (ARROW-10523) [Python] Pandas timestamps are inferred to have only microsecond precision

David Li created ARROW-10523:
--------------------------------

             Summary: [Python] Pandas timestamps are inferred to have only microsecond precision
                 Key: ARROW-10523
                 URL: https://issues.apache.org/jira/browse/ARROW-10523
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 2.0.0
            Reporter: David Li


{code:java}
import pyarrow as pa
import pandas as pd
arr = pa.array([pd.Timestamp(year=2020, month=1, day=1, nanosecond=999)])
print(arr)
print(arr.type) {code}
This gives:
{noformat}
[
  2020-01-01 00:00:00.000000
]
timestamp[us]
{noformat}
However, Pandas Timestamps have nanosecond precision, which would be nice to preserve in inference.

The reason is that TypeInferrer [hardcodes microseconds|https://github.com/apache/arrow/blob/apache-arrow-2.0.0/cpp/src/arrow/python/inference.cc#L466] as it only knows about the standard library datetime, so I'm treating this as a feature request and not quite a bug. Of course, this can be worked around easily by specifying an explicit type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)