You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/07/26 04:30:00 UTC
[jira] [Comment Edited] (SPARK-28515) to_timestamp returns null for
summer time switch dates
[ https://issues.apache.org/jira/browse/SPARK-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893318#comment-16893318 ]
Hyukjin Kwon edited comment on SPARK-28515 at 7/26/19 4:29 AM:
---------------------------------------------------------------
Which Python version do you use? IIRC, Python 3.4 and Python 3.5 has an issue for the folded time in DST.
was (Author: hyukjin.kwon):
Which Python version do you use? IIRC, Python 3.4 has an issue for the folded time in DST.
> to_timestamp returns null for summer time switch dates
> ------------------------------------------------------
>
> Key: SPARK-28515
> URL: https://issues.apache.org/jira/browse/SPARK-28515
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.3
> Environment: Spark 2.4.3 on Linux 64bit, openjdk-8-jre-headless
> Reporter: Andreas Költringer
> Priority: Major
>
> I am not sure if this is a bug - but it was a very unexpected behavior, so I'd like some clarification.
> When parsing datetime-strings, when the date-time in question falls into the range of a "summer time switch" (e.g. in (most of) Europe, on 2015-03-29 at 2am the clock was forwarded to 3am), the {{to_timestamp}} method returns {{NULL}}.
> Minimal Example (using Python):
> {code:java}
> >>> df = spark.createDataFrame([('201503290159',), ('201503290200',)], ['date_str'])
> >>> df.withColumn('timestamp', F.to_timestamp('date_str', 'yyyyMMddhhmm')).show()
> ---------------------------------+
> | date_str| timestamp|
> ---------------------------------+
> |201503290159|2015-03-29 01:59:00|
> |201503290200| null|
> ---------------------------------+ {code}
> A solution (or workaround) is to set the time zone for Spark to UTC:
> {{spark.conf.set("spark.sql.session.timeZone", "UTC")}}
> (see e.g. [https://stackoverflow.com/q/52594762)]
> Plain Java does not do this, e.g. this works as expected:
>
> {code:java}
> SimpleDateFormat dateFormat = new SimpleDateFormat("yyyyMMddhhmm");
> Date parsedDate = dateFormat.parse("201503290201");
> Timestamp timestamp = new java.sql.Timestamp(parsedDate.getTime());{code}
> So, is this really the intended behaviour? Is there documentation about this? THX.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org