You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/07 14:10:00 UTC

[jira] [Reopened] (ARROW-3703) [Python] DataFrame.to_parquet crashes if datetime column has time zones

     [ https://issues.apache.org/jira/browse/ARROW-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney reopened ARROW-3703:
---------------------------------

> [Python] DataFrame.to_parquet crashes if datetime column has time zones
> -----------------------------------------------------------------------
>
>                 Key: ARROW-3703
>                 URL: https://issues.apache.org/jira/browse/ARROW-3703
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>         Environment: pandas 0.23.4
> pyarrow 0.11.1
> Python 2.7, 3.5 - 3.7
> MacOS High Sierra (10.13.6)
>            Reporter: Diego Argueta
>            Assignee: Krisztian Szucs
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 0.12.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> On CPython 2.7.15, 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a {{datetime.datetime}} object serializes to Parquet just fine, but crashes with an {{AttributeError}} if you try to use the built-in {{timezone}} objects.
> To reproduce, on Python 3:
> {code:java}
> import datetime as dt
> import pandas as pd
> df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45, tzinfo=dt.timezone.utc)]})
> df.to_parquet('data.parq')
> {code}
>  
> On Python 2, create a subclass of {{datetime.tzinfo}} as shown [here|https://docs.python.org/2/library/datetime.html#datetime.tzinfo] and try the same thing.
>  
> The following exception results:
> {noformat}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py", line 1945, in to_parquet
>     compression=compression, **kwargs)
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 257, in to_parquet
>     return impl.write(df, path, compression=compression, **kwargs)
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 118, in write
>     table = self.api.Table.from_pandas(df)
>   File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 381, in dataframe_to_arrays
>     convert_types)]
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 380, in <listcomp>
>     for c, t in zip(columns_to_convert,
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column
>     return pa.array(col, type=ty, from_pandas=True, safe=safe)
>   File "pyarrow/array.pxi", line 167, in pyarrow.lib.array
>   File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 409, in get_datetimetz_type
>     type_ = pa.timestamp(unit, tz)
>   File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp
>   File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string
> AttributeError: 'datetime.timezone' object has no attribute 'zone'
> 'datetime.timezone' object has no attribute 'zone'
> {noformat}
>  
>  This doesn't happen if you use {{pytz.UTC}} as the timezone object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)