You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/14 12:19:00 UTC

[jira] [Updated] (ARROW-5359) [Python] timestamp_as_object support for pa.Table.to_pandas in pyarrow

     [ https://issues.apache.org/jira/browse/ARROW-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-5359:
----------------------------------
    Labels: pull-request-available  (was: )

> [Python] timestamp_as_object support for pa.Table.to_pandas in pyarrow
> ----------------------------------------------------------------------
>
>                 Key: ARROW-5359
>                 URL: https://issues.apache.org/jira/browse/ARROW-5359
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.13.0
>         Environment: Ubuntu
>            Reporter: Joe Muruganandam
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Creating ticket for issue reported in github([https://github.com/apache/arrow/issues/4284])
> h2. pyarrow (Issue with timestamp conversion from arrow to pandas)
> pyarrow Table.to_pandas has option date_as_object but does not have similar option for timestamp. When a timestamp column in arrow table is converted to pandas the target datetype is pd.Timestamp and pd.Timestamp does not handle time > 2262-04-11 23:47:16.854775807 and hence in the below scenario the date is transformed to incorrect value. Adding timestamp_as_object option in pa.Table.to_pandas will help in this scenario.
> #Python(3.6.8)
> import pandas as pd
> import pyarrow as pa
> pd.*version*
> '0.24.1'
> pa.*version*
> '0.13.0'
> import datetime
> df = pd.DataFrame(\{"test_date": [datetime.datetime(3000,12,31,12,0),datetime.datetime(3100,12,31,12,0)]})
> df
> test_date
> 0 3000-12-31 12:00:00
> 1 3100-12-31 12:00:00
> pa_table = pa.Table.from_pandas(df)
> pa_table[0]
> Column name='test_date' type=TimestampType(timestamp[us])
> [
> [
> 32535172800000000,
> 35690846400000000
> ]
> ]
> pa_table.to_pandas()
> test_date
> 0 1831-11-22 12:50:52.580896768
> 1 1931-11-22 12:50:52.580896768



--
This message was sent by Atlassian Jira
(v8.3.4#803005)