You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "WillAyd (via GitHub)" <gi...@apache.org> on 2024/04/17 23:25:15 UTC

[I] PyArrow cast Unable to cast strings without Zone Offset [arrow]

WillAyd opened a new issue, #41268:
URL: https://github.com/apache/arrow/issues/41268

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```python
   >>> pa.array(["2024-01-01 05:00:00"]).cast(pa.timestamp("s"))
   <pyarrow.lib.TimestampArray object at 0x7c6c794463e0>
   [
     2024-01-01 05:00:00
   ]
   
   >>> pa.array([datetime.datetime(2024, 1, 1, 5, 0, 0)]).cast(pa.timestamp("s"))
   <pyarrow.lib.TimestampArray object at 0x7c6c79444b80>
   [
     2024-01-01 05:00:00
   ]
   
   >>> pa.array([datetime.datetime(2024, 1, 1, 5, 0, 0)]).cast(pa.timestamp("s", "UTC"))
   <pyarrow.lib.TimestampArray object at 0x7c6c7956d720>
   [
     2024-01-01 05:00:00Z
   ]
   
   >>> pa.array(["2024-01-01 05:00:00"]).cast(pa.timestamp("s", "UTC"))
   ArrowInvalid: Failed to parse string: '2024-01-01 05:00:00' as a scalar of type timestamp[s, tz=UTC]: expected a zone offset. If these timestamps are in local time, cast to timestamp without timezone, then call assume_timezone.
   ```
   
   I'm having a hard time figuring out the best way to construct timezone aware arrays from strings instead of Python datetime objects. Based off the pattern set by the first 3 examples above it seems like a bug that the fourth does not work?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] PyArrow cast Unable to cast strings without Zone Offset [arrow]

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd commented on issue #41268:
URL: https://github.com/apache/arrow/issues/41268#issuecomment-2063605353

   Makes sense - thanks for the explanation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] PyArrow cast Unable to cast strings without Zone Offset [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #41268:
URL: https://github.com/apache/arrow/issues/41268#issuecomment-2063190391

   Casting string to timestamp is essentially parsing of the string (`strptime`), and for that we currently don't allow to parse to a non-tz-aware string to a tz-aware timestamp (for that you would need to guess if the string is in local wall time or in UTC, i.e. is it a tz localize or a tz convert operation, in pandas terms).
   
   The other examples you give are parsing a non-tz-aware string to a non-tz-aware timestamp (no ambiguity, this works fine) and casting non-tz-aware timestamp to tz-aware timestamp. This last case is also potentially ambiguous, but the casting here is a very simple zero-copy cast that essentially just changes the metadata of the timestamp type (to add a timezone), and thus essentially treats the input as UTC (and not local wall time, for which there is a specific kernel `pc.assume_timezone`).
   
   And so parsing a non-tz-aware string to a tz-aware timestamp can always be done in two steps, first parsing / casting to timestamp, and then converting to tz-aware timestamp:
   
   ```
   >>> pa.array(["2024-01-01 05:00:00"]).cast(pa.timestamp("s")).cast(pa.timestamp("s", "Europe/Brussels"))
   <pyarrow.lib.TimestampArray object at 0x7f065c331960>
   [
     2024-01-01 05:00:00Z
   ]
   >>> pc.assume_timezone(pa.array(["2024-01-01 05:00:00"]).cast(pa.timestamp("s")), "Europe/Brussels")
   <pyarrow.lib.TimestampArray object at 0x7f065c2d26e0>
   [
     2024-01-01 04:00:00Z
   ]
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] PyArrow cast Unable to cast strings without Zone Offset [arrow]

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.
WillAyd closed issue #41268: [Python] PyArrow cast Unable to cast strings without Zone Offset
URL: https://github.com/apache/arrow/issues/41268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org