You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/11/23 19:21:00 UTC
[jira] [Commented] (ARROW-10343) [C++] Unable to parse strings into timestamps

    [ https://issues.apache.org/jira/browse/ARROW-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448236#comment-17448236 ] 

Antoine Pitrou commented on ARROW-10343:
----------------------------------------

Timezone indicators are supported now as of 6.0.0:
{code:python}
>>> us_tz_arr
<pyarrow.lib.StringArray object at 0x7f7ad653b820>
[
  "2014-12-07 07:48:59.285332+00",
  "2014-12-07 08:01:49.758975+00",
  "2014-12-07 10:11:35.884304+00"
]
>>> us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC))
<pyarrow.lib.TimestampArray object at 0x7f7ad54b6e20>
[
  2014-12-07 07:48:59.285332,
  2014-12-07 08:01:49.758975,
  2014-12-07 10:11:35.884304
]
{code}


> [C++] Unable to parse strings into timestamps
> ---------------------------------------------
>
>                 Key: ARROW-10343
>                 URL: https://issues.apache.org/jira/browse/ARROW-10343
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 1.0.1
>         Environment: macOS 10.15.7, Python 3.8.2
>            Reporter: Niclas Roos
>            Priority: Minor
>              Labels: timestamp, timezone
>
> Hi,
> I'm working with parquet files generated by a AWS RDS Postgres snapshot export. 
> I'm trying to parse a date column stored as a string into a timestamp, but it fails.
> I've managed to parse the same date format (as in the first example below) when reading from a csv, so I tried to investigate it as far as I could on my own, and here's my results:
> {code:java}
> import pyarrow as pa
> import pytz
> #################################################################################
> ## the format I get from the database
> us_tz_arr = pa.array([
>   "2014-12-07 07:48:59.285332+00",
>   "2014-12-07 08:01:49.758975+00",
>   "2014-12-07 10:11:35.884304+00"])
> us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC))
> -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304+00
> #################################################################################
> ## tried removing the timezone
> us_arr = pa.array([
>   "2014-12-07 07:48:59.285332",
>   "2014-12-07 08:01:49.758975",
>   "2014-12-07 10:11:35.884304"])
> us_arr.cast(pa.timestamp('us'))
> -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304
> #################################################################################
> ## tried removing the microseconds but keeping the timezone
> second_tz_arr = pa.array([
>   "2014-12-07 07:48:59+00",
>   "2014-12-07 08:01:49+00",
>   "2014-12-07 10:11:35+00"])
> second_tz_arr.cast(pa.timestamp('s', tz=pytz.UTC))
> -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35+00
> #################################################################################
> ## removing microseconds and timezone, makes it work!
> s_arr = pa.array([
>   "2014-12-07 07:48:59",
>   "2014-12-07 08:01:49",
>   "2014-12-07 10:11:35"])
> s_arr.cast(pa.timestamp('s'))
> -> <pyarrow.lib.TimestampArray object at 0x7fbdf81ae460>
> [
>   2014-12-07 07:48:59,
>   2014-12-07 08:01:49,
>   2014-12-07 10:11:35
> ]{code}
>  PS. This is my first bug report, so apologies if important things are missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)