You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kevin Glasson (Jira)" <ji...@apache.org> on 2020/02/14 06:04:00 UTC

[jira] [Updated] (ARROW-7856) [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around

     [ https://issues.apache.org/jira/browse/ARROW-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Glasson updated ARROW-7856:
---------------------------------
    Summary: [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around  (was: to_pandas() Causing datetimes > pd.Timestamp.max to wrap around)

> [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around
> ------------------------------------------------------------------------
>
>                 Key: ARROW-7856
>                 URL: https://issues.apache.org/jira/browse/ARROW-7856
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.15.1
>         Environment: Distributor ID: Ubuntu
> Description:    Ubuntu 18.04.4 LTS
> Release:        18.04
> Codename:       bionic
> Python 3.7.3
> In [3]: pa.__version__
> Out[3]: '0.15.1'
> In [4]: pd.__version__
> Out[4]: '0.25.2'
>            Reporter: Kevin Glasson
>            Priority: Major
>
> When writing a dataframe containing `datetime.datetime` in an object columns any datetime that is greater than pd.Timestamp.max or less than pd.Timestamp.min is wrapped around.
>  
> For reference these are the timestamp min and max values.
>  
> {code:java}
> In [43]: pd.Timestamp.max
> Out[43]: Timestamp('2262-04-11 23:47:16.854775807')
> In [44]: pd.Timestamp.min
> Out[44]: Timestamp('1677-09-21 00:12:43.145225')
> {code}
>  
>  
> To reproduce the error using pandas
>  
> {code:java}
> In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
> In [50]: df
> Out[50]:
>                      A
> 0  2262-04-12 00:00:00
> In [51]: df.to_parquet("datetimething.parquet")
> In [52]: pd.read_parquet("datetimething.parquet")
> Out[52]:
>                               A
> 0 1677-09-21 00:25:26.290448384
> {code}
> I have narrowed it down as far as to note that it is happening when converting a `pa.Table` using the `to_pandas()` method.
> {code:java}
> In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
> In [31]: tf = pa.Table.from_pandas(df)
> In [32]: tf.columns
> Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8>
>  [
>    [
>      2262-04-12 00:00:00.000000
>    ]
>  ]
> ]
> In [33]: tf.to_pandas()
> Out[33]:                      A
> 0 1677-09-21 00:25:26.290448384
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)