You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/09/06 12:55:00 UTC

[jira] [Assigned] (ARROW-1435) [Python] PyArrow not propagating timezone information from Parquet to Python

     [ https://issues.apache.org/jira/browse/ARROW-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney reassigned ARROW-1435:
-----------------------------------

    Assignee: Wes McKinney

> [Python] PyArrow not propagating timezone information from Parquet to Python
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-1435
>                 URL: https://issues.apache.org/jira/browse/ARROW-1435
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.6.0
>            Reporter: Lucas Pickup
>            Assignee: Wes McKinney
>             Fix For: 0.7.0
>
>
> PyArrow reads timezone metadata for Timestamp values from Parquet. This information isn't propagated through to the resulting python datetime object though.
> {noformat}
> λ python
> Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa
> >>> import pyarrow.parquet as pq
> >>> import pytz
> >>> import pandas
> >>> from datetime import datetime
> >>>
> >>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S')
> >>> d1
> datetime.datetime(2015, 7, 5, 23, 50)
> >>> aware = pytz.utc.localize(d1)
> >>> aware
> datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>)
> >>>
> >>> df = pandas.DataFrame()
> >>> df['DateNaive'] = [d1]
> >>> df['DateAware'] = [aware]
> >>> df
>             DateNaive                 DateAware
> 0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00
> >>>
> >>> table  = pa.Table.from_pandas(df)
> >>> table
> pyarrow.Table
> DateNaive: timestamp[ns]
> DateAware: timestamp[ns, tz=UTC]
> __index_level_0__: int64
> -- metadata --
> pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
> >>>
> >>> pq.write_table(table, "E:\\pyarrowDates.parquet")
> >>>
> >>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet")
> >>> pyarrowTable
> pyarrow.Table
> DateNaive: timestamp[us]
> DateAware: timestamp[us]
> __index_level_0__: int64
> -- metadata --
> pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
> >>>
> >>> pyarrowDF = pyarrowTable.to_pandas()
> >>> pyarrowDF
>             DateNaive           DateAware
> 0 2015-07-05 23:50:00 2015-07-05 23:50:00
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)