You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/03/02 13:56:24 UTC

[GitHub] [arrow] AlenkaF opened a new issue, #34412: [Python] Converting python array to TimestampArray with naive datetime and datetime with various timezones

AlenkaF opened a new issue, #34412:
URL: https://github.com/apache/arrow/issues/34412

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   When converting a python array with datetime elements and mixed timezones into a pyarrow array there are two points that seem to be incorrect/could be improved:
   
   - the values seem to be calculated to the UTC timezone but the `tz` attribute of the `timestamp` is defaulted to the timezone of the first element in an array (seems wrong to me),
   - together with datetime elements with timezones a naive element can also be present and it is presumed that the naive element is in UTC timezone, which is not necessarily true.
   
   ```python
   >>> import zoneinfo
   >>> import datetime
   >>> import pyarrow as pa
   
   # Mixed timezones without naive datetime
   >>> data_mixed = [
   ...     datetime.datetime(2006, 1, 13, 12, 34, 56, 432539, tzinfo=zoneinfo.ZoneInfo(key='US/Eastern')),
   ...     datetime.datetime(2008, 1, 5, 5, 0, 0, 1000, tzinfo=datetime.timezone.utc),
   ...     datetime.datetime(2010, 8, 13, 5, 0, 0, 437699, tzinfo=zoneinfo.ZoneInfo(key='Europe/Moscow')),
   ... ]
   >>> pa.array(data_mixed)
   <pyarrow.lib.TimestampArray object at 0x11c6dd0a0>
   [
     2006-01-13 17:34:56.432539,
     2008-01-05 05:00:00.001000,
     2010-08-13 01:00:00.437699
   ]
   >>> pa.array(data_mixed).type
   TimestampType(timestamp[us, tz=US/Eastern])
   
   # Mixed timezones with naive datetime as the first element
   >>> data_mixed_with_naive_first = [
   ...     datetime.datetime(2007, 7, 13, 8, 23, 34, 123456),  # naive
   ...     datetime.datetime(2008, 1, 5, 5, 0, 0, 1000, tzinfo=datetime.timezone.utc),
   ...     None,
   ...     datetime.datetime(2006, 1, 13, 12, 34, 56, 432539, tzinfo=zoneinfo.ZoneInfo(key='US/Eastern')),
   ...     datetime.datetime(2010, 8, 13, 5, 0, 0, 437699, tzinfo=zoneinfo.ZoneInfo(key='Europe/Moscow')),
   ... ]
   >>> pa.array(data_mixed_with_naive_first)
   <pyarrow.lib.TimestampArray object at 0x11c6dd0a0>
   [
     2007-07-13 08:23:34.123456,
     2008-01-05 05:00:00.001000,
     null,
     2006-01-13 17:34:56.432539,
     2010-08-13 01:00:00.437699
   ]
   >>> pa.array(data_mixed_with_naive_first).type
   TimestampType(timestamp[us])
   
   # Mixed timezones with naive datetime not as first element
   >>> data_mixed_with_naive = [
   ...     datetime.datetime(2006, 1, 13, 12, 34, 56, 432539, tzinfo=zoneinfo.ZoneInfo(key='US/Eastern')),
   ...     datetime.datetime(2010, 8, 13, 5, 0, 0, 437699, tzinfo=zoneinfo.ZoneInfo(key='Europe/Moscow')),
   ...     datetime.datetime(2008, 1, 5, 5, 0, 0, 1000, tzinfo=datetime.timezone.utc),
   ...     datetime.datetime(2007, 7, 13, 8, 23, 34, 123456),  # naive
   ...     None,
   ... ]
   >>> pa.array(data_mixed_with_naive)
   <pyarrow.lib.TimestampArray object at 0x11c6dd0a0>
   [
     2006-01-13 17:34:56.432539,
     2010-08-13 01:00:00.437699,
     2008-01-05 05:00:00.001000,
     2007-07-13 08:23:34.123456,
     null
   ]
   >>> pa.array(data_mixed_with_naive).type
   TimestampType(timestamp[us, tz=US/Eastern])
   ```
   
   I think that if the datetime elements with various timezones are defaulted to UTC then we should also do the same with the  `tz` attribute.
   
   As for the case where a naive element is present the conversion could turn out an error and advise the user to add a timezone or have all elements naive.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org