You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "dc-heros (Jira)" <ji...@apache.org> on 2021/05/26 03:55:00 UTC

[jira] [Comment Edited] (SPARK-30696) Wrong result of the combination of from_utc_timestamp and to_utc_timestamp

    [ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351464#comment-17351464 ] 

dc-heros edited comment on SPARK-30696 at 5/26/21, 3:54 AM:
------------------------------------------------------------

fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes days
 For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 1960-09-25 but previous version have the cutoff at 8AM

Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 1960-09-25 08:30:00 and 1960-09-25 09:30:00 and the fromUTCtime just pick 1 of them, so there just wrong on the cutoff time in those function

Could you edit the description [~maxgekk]


was (Author: dc-heros):
fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes days
For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 1960-09-25 but previous version have the cutoff at 8AM

Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 1960-09-25 08:30:00 and 1960-09-25 09:30:00, so there just wrong on the cutoff time from those function

Could you edit the description [~maxgekk]

> Wrong result of the combination of from_utc_timestamp and to_utc_timestamp
> --------------------------------------------------------------------------
>
>                 Key: SPARK-30696
>                 URL: https://issues.apache.org/jira/browse/SPARK-30696
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Max Gekk
>            Priority: Major
>
> Applying to_utc_timestamp() to results of from_utc_timestamp() should return the original timestamp in the same time zone. In the range of 100 years, the combination of functions returns wrong results 280 times out of 1753200:
> {code:java}
> scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100
> SECS_PER_YEAR: Long = 31557600
> scala> val SECS_PER_MINUTE = 60L
> SECS_PER_MINUTE: Long = 60
> scala>  val tz = "America/Los_Angeles"
> tz: String = America/Los_Angeles
> scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * SECS_PER_MINUTE)
> df: org.apache.spark.sql.Dataset[Long] = [id: bigint]
> scala> val diff = df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0)
> warning: there was one deprecation warning; re-run with -deprecation for details
> diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint]
> scala> diff.count
> res14: Long = 280
> scala> df.count
> res15: Long = 1753200
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org