You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/03/09 13:50:00 UTC

[jira] [Created] (ARROW-15884) [C++][Doc] Document that the strptime kernel ignores %Z

Joris Van den Bossche created ARROW-15884:
---------------------------------------------

             Summary: [C++][Doc] Document that the strptime kernel ignores %Z
                 Key: ARROW-15884
                 URL: https://issues.apache.org/jira/browse/ARROW-15884
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Documentation
            Reporter: Joris Van den Bossche


After ARROW-12820, the {{strptime}} kernel still ignores the {{%Z}} specifier (for timezone names), and when using it, it basically ignores any string.

For example:

{code:python}
# the %z specifier now works (after ARROW-12820)
>>> pc.strptime(["2022-03-05 09:00:00+01"], format="%Y-%m-%d %H:%M:%S%z", unit="us")
<pyarrow.lib.TimestampArray object at 0x7f00c1dd21c0>
[
  2022-03-05 08:00:00.000000
]

# in theory this should give the same result, but %Z is still ignore
>>> pc.strptime(["2022-03-05 09:00:00 CET"], format="%Y-%m-%d %H:%M:%S %Z", unit="us")
<pyarrow.lib.TimestampArray object at 0x7f00c86d1ca0>
[
  2022-03-05 09:00:00.000000
]

# as a result any garbage in the string is also ignored
>>> pc.strptime(["2022-03-05 09:00:00 blabla"], format="%Y-%m-%d %H:%M:%S %Z", unit="us")
<pyarrow.lib.TimestampArray object at 0x7f00c1db1ca0>
[
  2022-03-05 09:00:00.000000
]
{code}

I don't think it is easy to actually fix this (at least as long as we use the system strptime, see also https://github.com/apache/arrow/pull/11358#issue-1020404727). But at least we should document this limitation / gotcha.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)