You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kevin Glasson (Jira)" <ji...@apache.org> on 2020/02/14 06:04:00 UTC
[jira] [Updated] (ARROW-7856) [Python] to_pandas() causing
datetimes > pd.Timestamp.max to wrap around
[ https://issues.apache.org/jira/browse/ARROW-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Glasson updated ARROW-7856:
---------------------------------
Summary: [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around (was: to_pandas() Causing datetimes > pd.Timestamp.max to wrap around)
> [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around
> ------------------------------------------------------------------------
>
> Key: ARROW-7856
> URL: https://issues.apache.org/jira/browse/ARROW-7856
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.15.1
> Environment: Distributor ID: Ubuntu
> Description: Ubuntu 18.04.4 LTS
> Release: 18.04
> Codename: bionic
> Python 3.7.3
> In [3]: pa.__version__
> Out[3]: '0.15.1'
> In [4]: pd.__version__
> Out[4]: '0.25.2'
> Reporter: Kevin Glasson
> Priority: Major
>
> When writing a dataframe containing `datetime.datetime` in an object columns any datetime that is greater than pd.Timestamp.max or less than pd.Timestamp.min is wrapped around.
>
> For reference these are the timestamp min and max values.
>
> {code:java}
> In [43]: pd.Timestamp.max
> Out[43]: Timestamp('2262-04-11 23:47:16.854775807')
> In [44]: pd.Timestamp.min
> Out[44]: Timestamp('1677-09-21 00:12:43.145225')
> {code}
>
>
> To reproduce the error using pandas
>
> {code:java}
> In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
> In [50]: df
> Out[50]:
> A
> 0 2262-04-12 00:00:00
> In [51]: df.to_parquet("datetimething.parquet")
> In [52]: pd.read_parquet("datetimething.parquet")
> Out[52]:
> A
> 0 1677-09-21 00:25:26.290448384
> {code}
> I have narrowed it down as far as to note that it is happening when converting a `pa.Table` using the `to_pandas()` method.
> {code:java}
> In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
> In [31]: tf = pa.Table.from_pandas(df)
> In [32]: tf.columns
> Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8>
> [
> [
> 2262-04-12 00:00:00.000000
> ]
> ]
> ]
> In [33]: tf.to_pandas()
> Out[33]: A
> 0 1677-09-21 00:25:26.290448384
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)