You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/06/16 08:02:25 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #36110: [C++][Python] Wrong result when converting time zones after 2038

jorisvandenbossche commented on issue #36110:
URL: https://github.com/apache/arrow/issues/36110#issuecomment-1594279063

   Digging a little bit further: it's partly due to a different timezone database, but mostly due to a different way of "interpolating" into the future (i.e. the offset data in the database only go until a certain time, so the question is which UTC offset to use for a datetime after that?). And pytz and zoneinfo make different choices for this: pytz will just propagate the last offset data point (and so not take into account any DST or not in the future), while zoneinfo uses additional information in the database to predict for future datetimes if it would be DST or not. 
   See the answer in https://stackoverflow.com/questions/74520944/how-does-zoneinfo-handle-dst-in-the-distant-future
   
   As a small illustration (using some internals of pytz and zoneinfo):
   
   <details>
   
   ```python
   >>> from datetime import datetime
   >>> dt = datetime(2038, 4, 1, 3)
   
   # pytz last offset data for this timezone is in 2037 and is one without DST
   >>> import pytz
   >>> tz_pytz = pytz.timezone("America/Boise")
   >>> print(tz_pytz._utc_transition_times[-1])
   2037-11-01 08:00:00
   >>> tz_pytz._transition_info[-1]
   (datetime.timedelta(days=-1, seconds=61200), datetime.timedelta(0), 'MST')
   # so we get no DST offset for any datetime in 2038 or later
   >>> tz_pytz.dst(dt)
   datetime.timedelta(0)
   >>> tz_pytz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=61200)  # UTC offset of -07:00
   >>> print(pytz.timezone("America/Boise").localize(dt))
   2038-04-01 03:00:00-07:00
   >>> print(pytz.timezone("America/Boise").localize(dt).astimezone(timezone.utc))
   2038-04-01 10:00:00+00:00
   
   # using zoneinfo in my conda env, it only has data up to 2007 and the last offset if a DST one
   >>> from zoneinfo._zoneinfo import ZoneInfo  # importing the python implemention (not the C one), so I can hack around
   >>> tz = ZoneInfo.no_cache("America/Boise")
   >>> print(datetime.fromtimestamp(tz._trans_utc[-1]))
   2007-03-11 10:00:00
   >>> tz._ttinfos[-1]
   _ttinfo(-1 day, 18:00:00, 1:00:00, MDT)
   
   # we can let zoneinfo use the data from pytz
   >>> from zoneinfo._tzpath import reset_tzpath
   >>> reset_tzpath(("/home/joris/miniconda3/envs/arrow-dev/lib/python3.10/site-packages/pytz/zoneinfo", ))
   >>> tz = ZoneInfo.no_cache("America/Boise")
   >>> print(datetime.fromtimestamp(tz._trans_utc[-1]))
   2037-11-01 09:00:00
   >>> tz._ttinfos[-1]
   _ttinfo(-1 day, 17:00:00, 0:00:00, MST)
   
   # the last offset is now a non-DST one, but because zoneinfo uses a rule to determine DST for
   # future datetimes the date of 2038-04-01 still uses a DST offset
   >>> tz.dst(dt)
   datetime.timedelta(seconds=3600)
   >>> tz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=64800)  # UTC offset of -06:00
   >>> print(dt.replace(tzinfo=tz))
   2038-04-01 03:00:00-06:00
   >>> print(dt.replace(tzinfo=tz).astimezone(timezone.utc))
   2038-04-01 09:00:00+00:00
   
   # but with a small hack we can disable this "rule-based" determination of future datetimes, and to let it
   # use the last offset data point, similar to the logic in pytz. And now we get similar result as pytz:
   >>> tz._tz_after = tz._ttinfos[-1]
   >>> tz.dst(dt)
   datetime.timedelta(0)   # no DST offset
   >>> tz.utcoffset(dt)
   datetime.timedelta(days=-1, seconds=61200)  # UTC offset of -07:00
   >>> print(dt.replace(tzinfo=tz))
   2038-04-01 03:00:00-07:00
   >>> print(dt.replace(tzinfo=tz).astimezone(timezone.utc))
   2038-04-01 10:00:00+00:00
   ```
   
   </details>
   
   So that explains the different UTC value we get depending on whether the python datetime object was using a `pytz` or `zoneinfo` timestamp (so that also explains https://github.com/apache/arrow/issues/15047#issuecomment-1593598589). 
   Of course, we also saw a different behaviour for our own `assume_timezone` kernel. But based on the result, I assume that this follows the logic of pytz (extending the last offset into the future) and doesn't support this rule-based DST determination for future dates. This seems to be confirmed by the comment at https://github.com/HowardHinnant/date/issues/563#issuecomment-607439821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org