You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2021/03/21 04:55:56 UTC

[kudu-CR] [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite

Hello Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17213

to look at the new patch set (#2).

Change subject: [backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite
......................................................................

[backup] set spark.sql.legacy.parquet.int96RebaseModeInWrite

After the bump to Spark 3.1.1, TestKuduBackup.testRandomBackupAndRestore
started failing with errors like the following:

02:04:37.919 [ERROR - Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] (Logging.scala:94) Aborting task
org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'CORRECTED' to write the datetime values as it is, if you are 100% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar.
	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInWrite(DataSourceUtils.scala:165) ~[spark-sql_2.12-3.1.1.jar:3.1.1]
...

Per their instructions, this sets the int96RebaseModeInWrite option.

Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/13/17213/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9ca4d9e69785dd9d056fa8e62c944d56cf219ed
Gerrit-Change-Number: 17213
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)