You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liu Neng (Jira)" <ji...@apache.org> on 2020/12/04 03:47:00 UTC

[jira] [Comment Edited] (SPARK-33632) to_date doesn't behave as documented

    [ https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243681#comment-17243681 ] 

Liu Neng edited comment on SPARK-33632 at 12/4/20, 3:46 AM:
------------------------------------------------------------

This is not an issue, you may misunderstand the docs.

You should use patternĀ m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!


was (Author: qwe1398775315):
you should use patternĀ m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> ------------------------------------
>
>                 Key: SPARK-33632
>                 URL: https://issues.apache.org/jira/browse/SPARK-33632
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: Frank Oosterhuis
>            Priority: Major
>         Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html] suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
>     val split = date.split("/")
>     val month = "%02d".format(split(0).toInt)
>     val day = "%02d".format(split(1).toInt)
>     val year = split(2).toInt + 2000
>     Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
>     val spark = SparkSession.builder().master("local[2]").getOrCreate()
>     spark.sparkContext.setLogLevel("ERROR")
>     import spark.implicits._
>     Seq("1/1/20", "10/31/20")
>       .toDF("raw")
>       .withColumn("to_date", to_date($"raw", "m/d/y"))
>       .withColumn("udf", toDate($"raw"))
>       .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org