You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by aw...@apache.org on 2021/03/21 05:31:53 UTC

[kudu] branch master updated: [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite

This is an automated email from the ASF dual-hosted git repository.

awong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new a0db990  [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
a0db990 is described below

commit a0db990e08173293e42a7490322f08681abaa5d3
Author: Andrew Wong <aw...@cloudera.com>
AuthorDate: Sat Mar 20 21:04:43 2021 -0700

    [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
    
    After the bump to Spark 3.1.1, TestKuduBackup.testRandomBackupAndRestore
    started failing with errors like the following:
    
    02:04:37.919 [ERROR - Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] (Logging.scala:94) Aborting task
    org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite [...]
    	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInWrite(DataSourceUtils.scala:165) ~[spark-sql_2.12-3.1.1.jar:3.1.1]
    ...
    
    Per their instructions, this sets the int96RebaseModeInWrite option.
    
    Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed
    Reviewed-on: http://gerrit.cloudera.org:8080/17213
    Reviewed-by: Grant Henke <gr...@apache.org>
    Tested-by: Andrew Wong <aw...@cloudera.com>
---
 java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
index c02f5de..13dcc5f 100644
--- a/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
+++ b/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
@@ -86,6 +86,7 @@ object KuduBackup {
     // 1900-01-01T00:00:00Z in Parquet. Otherwise incorrect values may be read by
     // Spark 2 or legacy version of Hive. See more details in SPARK-31404.
     session.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "LEGACY")
+    session.conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "LEGACY")
 
     // Write the data to the backup path.
     // The backup path contains the timestampMs and should not already exist.