You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/06/15 14:44:00 UTC

[jira] [Updated] (ARROW-5912) [Python] conversion from datetime objects with mixed timezones should normalize to UTC

     [ https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-5912:
-----------------------------------------
    Labels: beginner timestamp  (was: beginner)

> [Python] conversion from datetime objects with mixed timezones should normalize to UTC
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-5912
>                 URL: https://issues.apache.org/jira/browse/ARROW-5912
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: beginner, timestamp
>
> Currently, when having objects with mixed timezones, they are each separately interpreted as their local time:
> {code:python}
> >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris")
> >>> ts_pd_paris    
> Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris')
> >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki")
> >>> ts_pd_helsinki
> Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki')
> >>> a = pa.array([ts_pd_paris, ts_pd_helsinki])                                                                                                              
> >>> a
> <pyarrow.lib.TimestampArray object at 0x7f7856c4a360>
> [
>   1970-01-01 01:00:00.000000,
>   1970-01-01 02:00:00.000000
> ]
> >>> a.type
> TimestampType(timestamp[us])
> {code}
> So both times are actually about the same moment in time (the same value in UTC; in pandas their stored {{value}} is also the same), but once converted to pyarrow, they are both tz-naive but no longer the same time. That seems rather unexpected and a source for bugs.
> I think a better option would be to normalize to UTC, and result in a tz-aware TimestampArray with UTC as timezone. 
> That is also the behaviour of pandas if you force the conversion to result in datetimes (by default pandas will keep them as object array preserving the different timezones).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)