You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2020/01/24 20:48:00 UTC
[jira] [Commented] (SPARK-30632) to_timestamp() doesn't work with
certain timezones
[ https://issues.apache.org/jira/browse/SPARK-30632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023263#comment-17023263 ]
Maxim Gekk commented on SPARK-30632:
------------------------------------
Spark 2.4 and earlier versions use SimpleDateFormat to parse timestamp strings. Unfortunately, the class doesn't support time zones in the format like "America/Los_Angeles", see [https://stackoverflow.com/questions/23242211/java-simpledateformat-parse-timezone-like-america-los-angeles] . Spark 3.0 has migrated to DateTimeFormatter which doesn't have such issue. Port the changes back to Spark 2.4 is risky, and destabilizes it, IMHO.
> to_timestamp() doesn't work with certain timezones
> --------------------------------------------------
>
> Key: SPARK-30632
> URL: https://issues.apache.org/jira/browse/SPARK-30632
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.0, 2.4.4
> Reporter: Anton Daitche
> Priority: Major
>
> It seams that to_timestamp() doesn't work with timezones of the type <Country>/<City>, e.g. America/Los_Angeles.
> The code
> {code:scala}
> val df = Seq(
> ("2019-01-24 11:30:00.123", "America/Los_Angeles"),
> ("2020-01-01 01:30:00.123", "PST")
> ).toDF("ts_str", "tz_name")
> val ts_parsed = to_timestamp(
> concat_ws(" ", $"ts_str", $"tz_name"), "yyyy-MM-dd HH:mm:ss.SSS z"
> ).as("timestamp")
> df.select(ts_parsed).show(false)
> {code}
> prints
> {code}
> +-------------------+
> |timestamp |
> +-------------------+
> |null |
> |2020-01-01 10:30:00|
> +-------------------+
> {code}
> So, the datetime string with timezone PST is properly parsed, whereas the one with America/Los_Angeles is converted to null. According to [this|https://github.com/apache/spark/pull/24195#issuecomment-578055146] response on GitHub, this code works when run on the recent master version.
> See also the discussion in [this|https://github.com/apache/spark/pull/24195#issue] issue for more context.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org