You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/03/23 06:48:00 UTC
[jira] [Resolved] (SPARK-31211) Failure on loading 1000-02-29 from
parquet saved by Spark 2.4.5
[ https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-31211.
---------------------------------
Fix Version/s: 3.0.0
Resolution: Fixed
Issue resolved by pull request 27974
[https://github.com/apache/spark/pull/27974]
> Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
> ---------------------------------------------------------------
>
> Key: SPARK-31211
> URL: https://issues.apache.org/jira/browse/SPARK-31211
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Maxim Gekk
> Assignee: Maxim Gekk
> Priority: Major
> Fix For: 3.0.0
>
>
> Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for instance 1000-02-29:
> {code}
> $ export TZ="America/Los_Angeles"
> {code}
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")
> scala> val df = Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> +----------+
> | date|
> +----------+
> |1000-02-29|
> +----------+
> scala> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
> {code}
> Load the parquet files back by Spark 3.1.0-SNAPSHOT:
> {code:scala}
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT
> /_/
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_231)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> +----------+
> | date|
> +----------+
> |1000-03-06|
> +----------+
> scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a leap year
> at java.time.LocalDate.create(LocalDate.java:429)
> at java.time.LocalDate.of(LocalDate.java:269)
> at org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org