You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/06/05 19:51:00 UTC

[jira] [Commented] (SPARK-31557) Legacy parser incorrectly interprets pre-Gregorian dates

    [ https://issues.apache.org/jira/browse/SPARK-31557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127055#comment-17127055 ] 

Apache Spark commented on SPARK-31557:
--------------------------------------

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28398

> Legacy parser incorrectly interprets pre-Gregorian dates
> --------------------------------------------------------
>
>                 Key: SPARK-31557
>                 URL: https://issues.apache.org/jira/browse/SPARK-31557
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>             Fix For: 3.0.0
>
>
> With CSV:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map(x => s"$x,$x")
> seq: Seq[String] = List(0002-01-01,0002-01-01, 1000-01-01,1000-01-01, 1500-01-01,1500-01-01, 1800-01-01,1800-01-01)
> scala> val ds = seq.toDF("value").as[String]
> ds: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> spark.read.schema("expected STRING, actual DATE").csv(ds).show
> +----------+----------+
> |  expected|    actual|
> +----------+----------+
> |0002-01-01|0001-12-30|
> |1000-01-01|1000-01-06|
> |1500-01-01|1500-01-10|
> |1800-01-01|1800-01-01|
> +----------+----------+
> scala> 
> {noformat}
> Similarly, with JSON:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map { x =>
>   s"""{"expected": "$x", "actual": "$x"}"""
> }
>      |      | seq: Seq[String] = List({"expected": "0002-01-01", "actual": "0002-01-01"}, {"expected": "1000-01-01", "actual": "1000-01-01"}, {"expected": "1500-01-01", "actual": "1500-01-01"}, {"expected": "1800-01-01", "actual": "1800-01-01"})
> scala> 
> scala> val ds = seq.toDF("value").as[String]
> ds: org.apache.spark.sql.Dataset[String] = [value: string]
> scala> spark.read.schema("expected STRING, actual DATE").json(ds).show
> +----------+----------+
> |  expected|    actual|
> +----------+----------+
> |0002-01-01|0001-12-30|
> |1000-01-01|1000-01-06|
> |1500-01-01|1500-01-10|
> |1800-01-01|1800-01-01|
> +----------+----------+
> scala> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org