You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Leandro Rosa (JIRA)" <ji...@apache.org> on 2019/04/12 14:37:00 UTC

[jira] [Updated] (SPARK-27450) Timestamp cast fails when ISO8601 string omits zero minutes or seconds

     [ https://issues.apache.org/jira/browse/SPARK-27450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leandro Rosa updated SPARK-27450:
---------------------------------
    Description: 
ISO8601 allows to omit zero minutes, seconds and milliseconds.
{quote}
|hh:mm:ss.sss|_or_|hhmmss.sss|
|hh:mm:ss|_or_|hhmmss|
|hh:mm|_or_|hhmm|
| |hh|
{quote}
{quote}Either the seconds, or the minutes and seconds, may be omitted from the basic or extended time formats for greater brevity but decreased accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy time formats
{quote}
Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601]

Popular libs, such as [ZonedDateTime|https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html], respect that. However, Timestamp cast fails silently.

 
{code:java}
import org.apache.spark.sql.types._
val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 (missing TZ offset) [OK]
val new_df1 = df1
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df1.show(false)

+----------------+-------------------+
|eventTimeString |eventTimeTS |
+----------------+-------------------+
|2017-08-01T02:33|2017-08-01 02:33:00|
+----------------+-------------------+
{code}
{code:java}
val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL]
val new_df2 = df2
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df2.show(false)

+-----------------+-----------+
|eventTimeString |eventTimeTS|
+-----------------+-----------+
|2017-08-01T02:33Z|null |
+-----------------+-----------+
{code}
 
{code:java}
val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 [FAIL]
val new_df3 = df3
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df3.show(false)

+----------------------+-----------+
|eventTimeString |eventTimeTS|
+----------------------+-----------+
|2017-08-01T02:33-03:00|null |
+----------------------+-----------+
{code}
 

 

  was:
ISO8601 allows to omit zero minutes, seconds and milliseconds.
{quote}
|hh:mm:ss.sss|_or_|hhmmss.sss|
|hh:mm:ss|_or_|hhmmss|
|hh:mm|_or_|hhmm|
| |hh|
{quote}
{quote}Either the seconds, or the minutes and seconds, may be omitted from the basic or extended time formats for greater brevity but decreased accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy time formats
{quote}
Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601]

Popular libs, such as [ZonedDateTime|[https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html]], respect that. However, Timestamp cast fails silently.

 
{code:java}
import org.apache.spark.sql.types._
val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 (missing TZ offset) [OK]
val new_df1 = df1
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df1.show(false)

+----------------+-------------------+
|eventTimeString |eventTimeTS |
+----------------+-------------------+
|2017-08-01T02:33|2017-08-01 02:33:00|
+----------------+-------------------+
{code}
{code:java}
val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL]
val new_df2 = df2
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df2.show(false)

+-----------------+-----------+
|eventTimeString |eventTimeTS|
+-----------------+-----------+
|2017-08-01T02:33Z|null |
+-----------------+-----------+
{code}
 
{code:java}
val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 [FAIL]
val new_df3 = df3
.withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))

new_df3.show(false)

+----------------------+-----------+
|eventTimeString |eventTimeTS|
+----------------------+-----------+
|2017-08-01T02:33-03:00|null |
+----------------------+-----------+
{code}
 

 


> Timestamp cast fails when ISO8601 string omits zero minutes or seconds
> ----------------------------------------------------------------------
>
>                 Key: SPARK-27450
>                 URL: https://issues.apache.org/jira/browse/SPARK-27450
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: Spark 2.3.x
>            Reporter: Leandro Rosa
>            Priority: Major
>
> ISO8601 allows to omit zero minutes, seconds and milliseconds.
> {quote}
> |hh:mm:ss.sss|_or_|hhmmss.sss|
> |hh:mm:ss|_or_|hhmmss|
> |hh:mm|_or_|hhmm|
> | |hh|
> {quote}
> {quote}Either the seconds, or the minutes and seconds, may be omitted from the basic or extended time formats for greater brevity but decreased accuracy: [hh]:[mm], [hh][mm] and [hh] are the resulting reduced accuracy time formats
> {quote}
> Source: [Wikipedia ISO8601|https://en.wikipedia.org/wiki/ISO_8601]
> Popular libs, such as [ZonedDateTime|https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html], respect that. However, Timestamp cast fails silently.
>  
> {code:java}
> import org.apache.spark.sql.types._
> val df1 = Seq(("2017-08-01T02:33")).toDF("eventTimeString") // NON-ISO8601 (missing TZ offset) [OK]
> val new_df1 = df1
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df1.show(false)
> +----------------+-------------------+
> |eventTimeString |eventTimeTS |
> +----------------+-------------------+
> |2017-08-01T02:33|2017-08-01 02:33:00|
> +----------------+-------------------+
> {code}
> {code:java}
> val df2 = Seq(("2017-08-01T02:33Z")).toDF("eventTimeString") // ISO8601 [FAIL]
> val new_df2 = df2
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df2.show(false)
> +-----------------+-----------+
> |eventTimeString |eventTimeTS|
> +-----------------+-----------+
> |2017-08-01T02:33Z|null |
> +-----------------+-----------+
> {code}
>  
> {code:java}
> val df3 = Seq(("2017-08-01T02:33-03:00")).toDF("eventTimeString") // ISO8601 [FAIL]
> val new_df3 = df3
> .withColumn("eventTimeTS", col("eventTimeString").cast(TimestampType))
> new_df3.show(false)
> +----------------------+-----------+
> |eventTimeString |eventTimeTS|
> +----------------------+-----------+
> |2017-08-01T02:33-03:00|null |
> +----------------------+-----------+
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org