You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/21 11:27:00 UTC

[jira] [Created] (ARROW-18124) [Python] Support converting to non-nano datetime64 for pandas >= 2.0

Joris Van den Bossche created ARROW-18124:
---------------------------------------------

             Summary: [Python] Support converting to non-nano datetime64 for pandas >= 2.0
                 Key: ARROW-18124
                 URL: https://issues.apache.org/jira/browse/ARROW-18124
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche
             Fix For: 11.0.0


Pandas is adding capabilities to store non-nanosecond datetime64 data. At the moment, we however always do convert to nanosecond, regardless of the timestamp resolution of the arrow table (and regardless of the pandas metadata).

Using the development version of pandas:

{code}
In [1]: df = pd.DataFrame({"col": np.arange("2012-01-01", 10, dtype="datetime64[s]")})

In [2]: df.dtypes
Out[2]: 
col    datetime64[s]
dtype: object

In [3]: table = pa.table(df)

In [4]: table.schema
Out[4]: 
col: timestamp[s]
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 423

In [6]: table.to_pandas().dtypes
Out[6]: 
col    datetime64[ns]
dtype: object
{code}

This is because we have a {{coerce_temporal_nanoseconds}} conversion option which we hardcode to True (for top-level columns, we hardcode it to False for nested data). 

When users have pandas >= 2, we should support converting with preserving the resolution. We should certainly do so if the pandas metadata indicates which resolution was originally used (to ensure correct roundtrip). 
We _could_ (and at some point also _should_) also do that by default if there is no pandas metadata (but maybe only later depending on how stable this new feature is in pandas, as it is potentially a breaking change for our users if you use eg pyarrow to read a parquet file).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)