You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Grant Henke (Code Review)" <ge...@cloudera.org> on 2019/01/24 14:32:06 UTC

[kudu-CR] [backup] Use upsert on restore

Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12268


Change subject: [backup] Use upsert on restore
......................................................................

[backup] Use upsert on restore

Changes the restore job to us upserts instead of
inserts so that Spark task retries do not fail when
a key already exists.

Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
1 file changed, 3 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/12268/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>

[kudu-CR] [backup] Use upsert on restore

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12268

to look at the new patch set (#2).

Change subject: [backup] Use upsert on restore
......................................................................

[backup] Use upsert on restore

Changes the restore job to use upserts instead of
inserts so that Spark task retries do not fail when
a key already exists.

Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
1 file changed, 3 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/12268/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [backup] Use upsert on restore

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12268 )

Change subject: [backup] Use upsert on restore
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@74
PS1, Line 74:       // Upsert so that Spark task retries do not fail.
> There isn't a mechanism that I can hook into to know if a retry is occurrin
The point being that changing from upsert to something else wouldn't a breaking change, thus this doesn't preclude us from doing something better in the future. Sure, I buy that.



-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:06:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Use upsert on restore

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12268 )

Change subject: [backup] Use upsert on restore
......................................................................

[backup] Use upsert on restore

Changes the restore job to use upserts instead of
inserts so that Spark task retries do not fail when
a key already exists.

Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Reviewed-on: http://gerrit.cloudera.org:8080/12268
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
1 file changed, 3 insertions(+), 1 deletion(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [backup] Use upsert on restore

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12268 )

Change subject: [backup] Use upsert on restore
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12268/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12268/1//COMMIT_MSG@9
PS1, Line 9: us
use


http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@74
PS1, Line 74:       // Upsert so that Spark task retries do not fail.
I agree that upserts will be necessary when retrying, and for restoring an incremental backup. But could we still use insert when restoring a full backup and not retrying?

Insert opens the door for future optimizations like disabling duplicate key checking. With upsert we'll always need to do that (in order to convert the upsert into either an insert or an update).

See commit e25348cbb for some past context.



-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 24 Jan 2019 23:58:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Use upsert on restore

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12268 )

Change subject: [backup] Use upsert on restore
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12268/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12268/1//COMMIT_MSG@9
PS1, Line 9: us
> use
Done


http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12268/1/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@74
PS1, Line 74:       // Upsert so that Spark task retries do not fail.
> I agree that upserts will be necessary when retrying, and for restoring an 
There isn't a mechanism that I can hook into to know if a retry is occurring or not (that I know of). Best we could do is disable retries all together and fail the entire job . Then we can drop the table and we can retry from scratch.

If we add custom optimizations, they can be handled regardless of the op-type and we can change to op-type used at that point.

Alternatively, we could add some custom handling to truncate a partition on failure to avoid this problem. 

Regardless, I think this is the best we can do right now, and we can change it when we can do better.



-- 
To view, visit http://gerrit.cloudera.org:8080/12268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9905fe4301db8c06fe4f18318ccb04b4179a1601
Gerrit-Change-Number: 12268
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Fri, 25 Jan 2019 18:51:41 +0000
Gerrit-HasComments: Yes