You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by aw...@apache.org on 2021/03/21 05:31:53 UTC
[kudu] branch master updated: [backup] set
spark.sql.legacy.parquet.int96RebaseModeInWrite
This is an automated email from the ASF dual-hosted git repository.
awong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git
The following commit(s) were added to refs/heads/master by this push:
new a0db990 [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
a0db990 is described below
commit a0db990e08173293e42a7490322f08681abaa5d3
Author: Andrew Wong <aw...@cloudera.com>
AuthorDate: Sat Mar 20 21:04:43 2021 -0700
[backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
After the bump to Spark 3.1.1, TestKuduBackup.testRandomBackupAndRestore
started failing with errors like the following:
02:04:37.919 [ERROR - Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] (Logging.scala:94) Aborting task
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite [...]
at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInWrite(DataSourceUtils.scala:165) ~[spark-sql_2.12-3.1.1.jar:3.1.1]
...
Per their instructions, this sets the int96RebaseModeInWrite option.
Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed
Reviewed-on: http://gerrit.cloudera.org:8080/17213
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Andrew Wong <aw...@cloudera.com>
---
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala | 1 +
1 file changed, 1 insertion(+)
diff --git a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
index c02f5de..13dcc5f 100644
--- a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
+++ b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
@@ -86,6 +86,7 @@ object KuduBackup {
// 1900-01-01T00:00:00Z in Parquet. Otherwise incorrect values may be read by
// Spark 2 or legacy version of Hive. See more details in SPARK-31404.
session.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
+ session.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY")
// Write the data to the backup path.
// The backup path contains the timestampMs and should not already exist.