You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Lucas Pickup (JIRA)" <ji...@apache.org> on 2017/08/30 19:46:00 UTC

[jira] [Created] (ARROW-1435) PyArrow not propagating timezone information from Parquet to Pyhon

Lucas Pickup created ARROW-1435:
-----------------------------------

             Summary: PyArrow not propagating timezone information from Parquet to Pyhon
                 Key: ARROW-1435
                 URL: https://issues.apache.org/jira/browse/ARROW-1435
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.6.0
            Reporter: Lucas Pickup


PyArrow reads timezone metadata for Timestamp values from Parquet. This information isn't propagated through to the resulting python datetime object though.

{noformat}
λ python
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pytz
>>> import pandas
>>> from datetime import datetime
>>>
>>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S')
>>> d1
datetime.datetime(2015, 7, 5, 23, 50)
>>> aware = pytz.utc.localize(d1)
>>> aware
datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>)
>>>
>>> df = pandas.DataFrame()
>>> df['DateNaive'] = [d1]
>>> df['DateAware'] = [aware]
>>> df
            DateNaive                 DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00
>>>
>>> table  = pa.Table.from_pandas(df)
>>> table
pyarrow.Table
DateNaive: timestamp[ns]
DateAware: timestamp[ns, tz=UTC]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
>>>
>>> pq.write_table(table, "E:\\pyarrowDates.parquet")
>>>
>>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet")
>>> pyarrowTable
pyarrow.Table
DateNaive: timestamp[us]
DateAware: timestamp[us]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": "DateAware", "pandas_type": "datetimetz", "numpy_type": "datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns": ["__index_level_0__"]}
>>>
>>> pyarrowDF = pyarrowTable.to_pandas()
>>> pyarrowDF
            DateNaive           DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00

{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)