You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2019/03/28 20:03:22 UTC

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12879


Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................

WIP: [backup] Add incremental backup/restore support

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/Operation.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
16 files changed, 662 insertions(+), 389 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 12 Apr 2019 21:48:57 +0000
Gerrit-HasComments: No

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 20:34:10 +0000
Gerrit-HasComments: No

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#2).

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................

WIP: [backup] Add incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

TODO: More testing and docs.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,065 insertions(+), 389 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Get the operation type based on the change type column.
             :               // This will always be the last column in the row.
             :               val rowActionValue = row.getByte(row.length - 1)
> OK makes sense. Would like for design decisions like these to be documented
I added detailed documentation to the RowAction enum.



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 20:05:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118
PS2, Line 118:   PartitionMetadataPB partitions = 8;
> Skipped 8?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163
PS2, Line 163:  * Node class to represent nodes in the backup graph.
> Curious why you went with the noun "vertex"; isn't "node" the more commonly
No specific reason, will change to Node.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS2, Line 35:  */
> Nit: Javadoc not quite formatted properly?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58
PS2, Line 58:       // use the `to_ms` metadata as the `from_ms` time for this backup.
> Nit: too long?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144
PS2, Line 144:     val columnCount = if (incremental) fieldCount - 1 else fieldCount
> This function is called per-row, right? I wonder if it'd be more performant
We could assume the column count is static across the life of an RDD and memoize this calculation, but the existing implementation didn't do that, so I kept the behavior the same. The additional if check should be cheap relative to the other work in the method.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39
PS2, Line 39:  */
> Formatting?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54
PS2, Line 54:     // TODO: Make parallel so each table isn't processed serially.
            :     options.tables.foreach { tableName =>
            :       val graph = io.readBackupGraph(tableName).filterByTim
> Did you test what effect this has? If it may have a drastic impact on perfo
This wasn't meant to be included, I was experimenting with it. I will remove it.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Generate an operation based on the row action.
             :               val operation = rowAction match {
             :                 case RowAction.UPSERT => table.newUpsert()
> Does Scala allow returning tuples like in Python?

Yes it does

> Seems like it'd be more elegant to return the row action separately from the row itself.

The Row abstraction is a result of the Spark framework. It expects Dataframes to work with Rows and is less flexible than the raw RDD API.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName,
> Why do these three need to be listed here and in RestoreOptions if both ext
The CommonOptions trait is specifying that BackupOptions/RestoreOptions needs to implement a value function for tables, rootPath, and  kuduMasterAddresses. But it doesn't specify "how". BackupOptions/RestoreOptions implement those functions via a constructor.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93
PS2, Line 93:         // TODO: Document the limitations based on cluster configuration
> Nit: too long?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala@59
PS2, Line 59:  */
> Could you doc at least the interesting methods here?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 18:26:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has removed a vote on this change.

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#4).

Change subject: [backup] Add initial incremental backup/restore support
......................................................................

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,127 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#3).

Change subject: [backup] Add initial incremental backup/restore support
......................................................................

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,112 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Generate an operation based on the row action.
             :               val operation = rowAction match {
             :                 case RowAction.UPSERT => table.newUpsert()
> > Does Scala allow returning tuples like in Python?
OK, I was missing two things here:
1. There's an extra layer of indirection between this and the iterator in KuduBackupRDD, and the internal row is the only mechanism to pass data from one to the other.
2. More importantly, KuduBackupRDD is the read-side of the equation while this is the write-side; there's a persistence step in between, and we need to serialize the row action in order for it to survive the persistence.

Next question: why bother converting is_deleted in KuduBackupRDD at all? Why not persist it as-is and convert it into a row action on-the-fly during restore? Then you won't need a persistence model for RowAction; it could be a simple enum, or maybe not exist at all (you could switch here on the value of the boolean is_deleted column).


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName,
> The CommonOptions trait is specifying that BackupOptions/RestoreOptions nee
Hmm, not really seeing the point of the trait then, at least not in terms of reducing LOC. Is it used somewhere else in the patch that I missed?



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:17:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Generate an operation based on the row action.
             :               val operation = rowAction match {
             :                 case RowAction.UPSERT => table.newUpsert()
> > why bother converting is_deleted in KuduBackupRDD at all?
OK makes sense. Would like for design decisions like these to be documented in the code somewhere though.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName,
> The benefit of the trait is seen in SessionIO, I can use BackupOptions as a
I did indeed miss that, thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:39:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118
PS2, Line 118:   int32 version = 9;
Skipped 8?

Since this is a new message type, I'd actually make this the first field and move it to the top, to increase its visibility.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163
PS2, Line 163:  * Vertex class to represent vertices in the backup graph.
Curious why you went with the noun "vertex"; isn't "node" the more commonly used term in CS?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS2, Line 35:  * */
Nit: Javadoc not quite formatted properly?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58
PS2, Line 58:       // find the previous backup and use the `to_ms` metadata as the `from_ms` time for this backup.
Nit: too long?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144
PS2, Line 144:     val columnCount = if (incremental) fieldCount - 1 else fieldCount
This function is called per-row, right? I wonder if it'd be more performant to have a "full" version and an "incremental" version rather than checking "if incremental" with each call.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39
PS2, Line 39:  * */
Formatting?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54
PS2, Line 54:     // Disable file splitting to keep one Spark task per Kudu partition
            :     // and reduce compaction when restoring.
            :     session.conf.set(ParquetInputFormat.SPLIT_FILES, false)
Did you test what effect this has? If it may have a drastic impact on performance, may want to separate it into its own patch so it's not buried in this large incremental backup patch.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Get the operation type based on the change type column.
             :               // This will always be the last column in the row.
             :               val rowActionValue = row.getByte(row.length - 1)
Does Scala allow returning tuples like in Python? Seems like it'd be more elegant to return the row action separately from the row itself. Maybe more efficient too; you wouldn't need to convert it from an enum into a byte and back.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName,
Why do these three need to be listed here and in RestoreOptions if both extend CommonOptions?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93
PS2, Line 93:         // TODO: Document the limitations based on cluster configuration (ex: ancient history watermark).
Nit: too long?


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala@59
PS2, Line 59: class SessionIO(val session: SparkSession, options: CommonOptions) {
Could you doc at least the interesting methods here?



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 04:21:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12879/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33
PS5, Line 33:   // The key is the ToMs/FromMs of the adjacent backups.
Suggestion:

  // Index of backup.fromMs -> backup for use in chaining backups together.



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 18:54:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 4: Code-Review+1

Would be good to get another set of eyes on this, preferably with Scala and/or Spark expertise.


-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 21:27:02 +0000
Gerrit-HasComments: No

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 21:36:12 +0000
Gerrit-HasComments: No

[kudu-CR] WIP: [backup] Add incremental backup/restore support

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: WIP: [backup] Add incremental backup/restore support
......................................................................


Patch Set 1:

This looks good. To address the potential conflict where a table is renamed and then recreated, we can allow both the renamed table and the newly created table to coexist in the same root dir by naming the directory structure for a table with the table id as a component and storing both current table name and table id in a metadata file.

We can document the desired behavior of a given naming conflict as: if two tables had the same name, they never had the same name at the same time. When restoring a table from backup by name and timestamp, we select the backup to restore as the backup of the table with that name at that time.


-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Apr 2019 04:50:47 +0000
Gerrit-HasComments: No

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12879/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/5/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33
PS5, Line 33:   // Index of backup.fromMs -> backup for use in chaining backups together.
> Suggestion:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 21:35:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 4: Code-Review+1

(9 comments)

Looks good.

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33
PS4, Line 33:   private val adjacencyList = mutable.Map[Long, mutable.ListBuffer[BackupNode]]()
Please add a comment the describing key -> val mapping.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@34
PS4, Line 34:   private val nodes = mutable.Set[Long]()
Needs comment. Is 'nodes' used for anything?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@37
PS4, Line 37:   private val fullBackupFromMs = 0
style nit: this looks like a constant, so should be named with InitialCaps?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@90
PS4, Line 90:       adjacencyList(fromMs).flatMap { node =>
            :         allPaths(node.metadata.getToMs, path ++ List(node))
Wow, Scala is fancy.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@149
PS4, Line 149: represents what the graph would look like
             :    * at the passed point in time.
suggestion: that represents the graph including only nodes with a ToTime equal to or less than the specified time


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@199
PS4, Line 199:   def pathString: String = backups.map(_.metadata.getFromMs).mkString(" -> ")
This is amazingly terse. Scala has built in method for everything.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS4, Line 66:         } else {
We should have a --force-incremental to ensure that a backup administrator can guarantee some kind of SLA if they can't afford the resources to do a full backup and would rather get an error. OK to add this as a TODO item though.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@56
PS4, Line 56:       val graph = io.readBackupGraph(tableName).filterByTime(options.timestampMs)
What do you think about adding a TODO to add an option for an exact match on "to" timestamp instead of the best match based on timestamp?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java
File java/kudu-client/src/main/java/org/apache/kudu/Schema.java:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java@206
PS4, Line 206:     Integer index = this.columnsByName.get(columnName);
             :     return index != null;
nit: how about: return columnsByName.containsKey(columnName);



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 04:07:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:               // Get the operation type based on the change type column.
             :               // This will always be the last column in the row.
             :               val rowActionValue = row.getByte(row.length - 1)
> why bother converting is_deleted in KuduBackupRDD at all?

I could represent a boolean with a byte or an enum in that same byte of space, I decided to use an enum since it seemed more flexible for an on-disk format which we would need to support long term.

I imagine we could use RowAction to support INSERT in the future or perhaps UPDATE if we have full fidelity and sparse row backup support. If go the boolean route, any new feature would need a new column.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37:     tables: Seq[String],
            :     rootPath: String,
            :     kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName,
> Hmm, not really seeing the point of the trait then, at least not in terms o
The benefit of the trait is seen in SessionIO, I can use BackupOptions as a parameter for functions that are used in both the Backup and Restore jobs.



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:23:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has removed a vote on this change.

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#5).

Change subject: [backup] Add initial incremental backup/restore support
......................................................................

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,124 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33
PS4, Line 33:   private val adjacencyList = mutable.Map[Long, mutable.ListBuffer[BackupNode]]()
> Please add a comment the describing key -> val mapping.
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@34
PS4, Line 34:   private val nodes = mutable.Set[Long]()
> Needs comment. Is 'nodes' used for anything?
I can remove it. I had used it but later removed it.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@37
PS4, Line 37:   private val fullBackupFromMs = 0
> style nit: this looks like a constant, so should be named with InitialCaps?
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@90
PS4, Line 90:       adjacencyList(fromMs).flatMap { node =>
            :         allPaths(node.metadata.getToMs, path ++ List(node))
> Wow, Scala is fancy.
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@149
PS4, Line 149: represents what the graph would look like
             :    * at the passed point in time.
> suggestion: that represents the graph including only nodes with a ToTime eq
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@199
PS4, Line 199:   def pathString: String = backups.map(_.metadata.getFromMs).mkString(" -> ")
> This is amazingly terse. Scala has built in method for everything.
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS4, Line 66:         } else {
> We should have a --force-incremental to ensure that a backup administrator 
You can force an incremental by manually setting `fromMs` via the command line options. See `tableOptions.fromMs != 0`.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@56
PS4, Line 56:       val graph = io.readBackupGraph(tableName).filterByTime(options.timestampMs)
> What do you think about adding a TODO to add an option for an exact match o
Done


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java
File java/kudu-client/src/main/java/org/apache/kudu/Schema.java:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java@206
PS4, Line 206:     Integer index = this.columnsByName.get(columnName);
             :     return index != null;
> nit: how about: return columnsByName.containsKey(columnName);
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 13:36:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add initial incremental backup/restore support

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#6).

Change subject: [backup] Add initial incremental backup/restore support
......................................................................

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,124 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>