You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2018/06/13 19:59:24 UTC

[kudu-CR] Kudu Backup/Restore Spark Jobs

Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10375

to look at the new patch set (#16).

Change subject: Kudu Backup/Restore Spark Jobs
......................................................................

Kudu Backup/Restore Spark Jobs

Adds a rough base implementation of Kudu backup and restore
Spark jobs. There are many todos indicating gaps and more testing
and details to be be finished.  However, these base jobs work and are
in a functional state that can be committed and iterated on as we
build up and improve our backup functionality.

These jobs, as annotated, should be considered private, unstable,
and experimental.

The backup job can output one to many tables data to any spark
compatible path in any spark compatible format, the defaults being
HDFS and Parquet. Each table’s data is written in a subdirectory of
the provided path. The subdirectory’s name is the url encoded table
name. Additionally in the each tables directory a json metadata file is
output with the metadata needed to recreate the table that was
exported when restoring.

The restore job can read the data and metadata generated and create
“restore” tables with a matching schema and reload the data.

The job arguments are a work in progress and will likely be enhanced
and simplified as we find what is useful and what isn’t through
performance and functional testing. More documentation will be
generated when the jobs are ready for general use.

Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
---
M java/gradle/dependencies.gradle
A java/kudu-backup/build.gradle
A java/kudu-backup/pom.xml
A java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/resources/log4j.properties
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Type.java
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala
M java/pom.xml
M java/settings.gradle
18 files changed, 1,568 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/16
-- 
To view, visit http://gerrit.cloudera.org:8080/10375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7
Gerrit-Change-Number: 10375
Gerrit-PatchSet: 16
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>