You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2020/01/24 21:13:00 UTC

[jira] [Comment Edited] (SPARK-30632) to_timestamp() doesn't work with certain timezones

    [ https://issues.apache.org/jira/browse/SPARK-30632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023263#comment-17023263 ] 

Maxim Gekk edited comment on SPARK-30632 at 1/24/20 9:12 PM:
-------------------------------------------------------------

Spark 2.4 and earlier versions use SimpleDateFormat to parse timestamp strings. Unfortunately, the class doesn't support time zones in the format like "America/Los_Angeles", see [https://stackoverflow.com/questions/23242211/java-simpledateformat-parse-timezone-like-america-los-angeles] . Spark 3.0 has migrated to DateTimeFormatter which doesn't have such issue. Port the changes back to Spark 2.4 is risky, and destabilizes it, IMHO. One of the reasons is this requires to change calendar system to Proleptic Gregorian calendar, see https://issues.apache.org/jira/browse/SPARK-26651


was (Author: maxgekk):
Spark 2.4 and earlier versions use SimpleDateFormat to parse timestamp strings. Unfortunately, the class doesn't support time zones in the format like "America/Los_Angeles", see [https://stackoverflow.com/questions/23242211/java-simpledateformat-parse-timezone-like-america-los-angeles] . Spark 3.0 has migrated to DateTimeFormatter which doesn't have such issue. Port the changes back to Spark 2.4 is risky, and destabilizes it, IMHO.

> to_timestamp() doesn't work with certain timezones
> --------------------------------------------------
>
>                 Key: SPARK-30632
>                 URL: https://issues.apache.org/jira/browse/SPARK-30632
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0, 2.4.4
>            Reporter: Anton Daitche
>            Priority: Major
>
> It seams that to_timestamp() doesn't work with timezones of the type <Country>/<City>, e.g. America/Los_Angeles.
> The code
> {code:scala}
> val df = Seq(
>     ("2019-01-24 11:30:00.123", "America/Los_Angeles"), 
>     ("2020-01-01 01:30:00.123", "PST")
> ).toDF("ts_str", "tz_name")
> val ts_parsed = to_timestamp(
>     concat_ws(" ", $"ts_str", $"tz_name"), "yyyy-MM-dd HH:mm:ss.SSS z"
> ).as("timestamp")
> df.select(ts_parsed).show(false)
> {code}
> prints
> {code}
> +-------------------+
> |timestamp          |
> +-------------------+
> |null               |
> |2020-01-01 10:30:00|
> +-------------------+
> {code}
> So, the datetime string with timezone PST is properly parsed, whereas the one with America/Los_Angeles is converted to null. According to [this|https://github.com/apache/spark/pull/24195#issuecomment-578055146] response on GitHub, this code works when run on the recent master version. 
> See also the discussion in [this|https://github.com/apache/spark/pull/24195#issue] issue for more context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org