You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/03/23 06:48:00 UTC

[jira] [Resolved] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5

     [ https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-31211.
---------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 27974
[https://github.com/apache/spark/pull/27974]

> Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
> ---------------------------------------------------------------
>
>                 Key: SPARK-31211
>                 URL: https://issues.apache.org/jira/browse/SPARK-31211
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29:
> {code}
> $ export TZ="America/Los_Angeles"
> {code}
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")
> scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> +----------+
> |      date|
> +----------+
> |1000-02-29|
> +----------+
> scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
> {code}
> Load the parquet files back by Spark 3.1.0-SNAPSHOT:
> {code:scala}
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
>       /_/
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> +----------+
> |      date|
> +----------+
> |1000-03-06|
> +----------+
> scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year
> 	at java.time.LocalDate.create(LocalDate.java:429)
> 	at java.time.LocalDate.of(LocalDate.java:269)
> 	at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org