You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/05/22 23:29:59 UTC

[kudu-CR] [backup] Add a tool to clean up old backups

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13405


Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
5 files changed, 317 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13405

to look at the new patch set (#6).

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off. However, a backup on the restore path will not be deleted,
even if it lies on a path that is expired.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 337 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
> Please check out my divergent descendant paths part of my comment, I think 
OK. I added a test for a "forked" backup off of the restore path.



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 21:44:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13405

to look at the new patch set (#5).

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off. However, a backup on the restore path will not be deleted,
even if it lies on a path that is expired.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 330 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13405/3/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/3/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@108
PS3, Line 108:     options.tables.foreach { tableName =>
this breaks the options.tables.isEmpty case


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
> Well, we actually sort of can't because backup dirs are named using only th
That's a good point. Now that we have CLI tools, it seems like we should change our naming convention, but that is a separate issue.

Having divergent descendant paths is something that we can test and while the code is currently written to avoid deleting anything in the restore path, it would be good to have test coverage that breaks if anyone ever messes up that logic in the future.



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 17:53:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13405

to look at the new patch set (#2).

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 319 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
> Naming etc is a bit yikes, TBH. The table name situation is also tough. Get
Please check out my divergent descendant paths part of my comment, I think you may have missed it in the previous response.



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 21:24:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13405

to look at the new patch set (#3).

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off. However, a backup on the restore path will not be deleted,
even if it lies on a path that is expired.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 333 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off. However, a backup on the restore path will not be deleted,
even if it lies on a path that is expired.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Reviewed-on: http://gerrit.cloudera.org:8080/13405
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 337 insertions(+), 30 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13405

to look at the new patch set (#4).

Change subject: [backup] Add a tool to clean up old backups
......................................................................

[backup] Add a tool to clean up old backups

This adds a tool that cleans up old backups. Eligibility for cleanup is
a property of the backup path, not the actual backup. There are two
conditions:
1. The path is not the restore path.
2. Every backup on the path is older than the expiration age, which the
   tool allows users to express in days. The default is 30 days.
If a path satisfies both conditions, the tool will delete its backups,
from most recent to least recent. This way, if the tool terminates
mid-delete, the next run of the tool can pick up where the previous one
left off. However, a backup on the restore path will not be deleted,
even if it lies on a path that is expired.

The tool has a dry run mode, a verbose mode, and a table name filter. It
does not normally output anything on success.

There are a couple of situations not currently handled by this tool:
1. Failed backups with no metadata.
2. Backup paths with no full backup at the beginning.
This is because the backup graph abstraction doesn't presently handle
these cases in a way the tool can use. The tool can be further improved
after a follow up to enhance the backup graph.

Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
---
M java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
A java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
M java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCLI.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
A java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestUtils.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
7 files changed, 330 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/13405/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 22:16:47 +0000
Gerrit-HasComments: No

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13405/3/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/3/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@108
PS3, Line 108:       // from last to first in the path ensures that if the tool crashes partway through then a
> this breaks the options.tables.isEmpty case
Oops. It actually broke everything. I did things another way, but again it's very temporary given upcoming patches.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
> That's a good point. Now that we have CLI tools, it seems like we should ch
Naming etc is a bit yikes, TBH. The table name situation is also tough. Getting all graphs by names gets all graphs where some name in the graph matches instead of the latest name. Might be good to fix that so it returns only graphs for tables matching latest name. I will do that as part of the patterns patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 18:13:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala@107
PS2, Line 107:     fs.delete(backupPath(metadata.getTableId, metadata.getTableName, metadata.getToMs), true)
> What is our crash consistency story here?
It depends on the implementation of the filesystem the backups are stored on. It deletes the directory recursively, so it's likely it could be partway done after a crash. This would be ok as long as the backup metadata is still around. If the directory is still there but the metadata is gone, then that would cause an error trying to read the backup that's supposed to be in that directory.

To handle that situation, we ought to detect and clean up directories with unusable backups, which basically means directories with no metadata. The present backup code doesn't handle that situation and it would throw.

I think that addition is a follow-up to this base tool. I added a todo where the tool calls this method.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@90
PS2, Line 90:     // 1. The table name matches the provided names (does not apply if no names were specified).
> This will match a table that had the specified name at any time in its hist
You're right. I see the trick that the restore job uses to make sure it's only restoring tables whose current name in in the list of specified tables. I've applied the same trick here.

Don't worry too much about it, though, because it will change soon. First, when single table failures can be non-fatal, and, second, when the table specifiers can be patterns.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@92
PS2, Line 92:     // 3. The backup is not on the current restore path.
> Can we add a TODO here for dropped tables? Maybe something like:
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@102
PS2, Line 102: toInstant
> nit: lastBackupInstant might be a better name
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@51
PS2, Line 51: epochMs
> nit: rename to something like epochMillisBefore()
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@62
PS2, Line 62: ts
> nit: rename ts to ages maybe?
Done


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
> Can we add a test case where there are multiple toMs / fromMs alignments? I
Well, we actually sort of can't because backup dirs are named using only the toMs, so you can't actually have a backup state on the fs that has multiple toMs alignments. Multiple fromMs alignments happen anytime there's multiple full backups, so it's tested here.

Also, if the backup graph handles these cases correctly (returns the right backup paths) then the tool will clean them correctly based on the two rules:
1. Delete backups on paths that are expired (head of path is older than expiration age).
2. Don't delete backups on the restore path regardless of whatever other path the backup is on.

The backup graph already has tests with situations like that too.



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 28 May 2019 16:52:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [backup] Add a tool to clean up old backups

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13405 )

Change subject: [backup] Add a tool to clean up old backups
......................................................................


Patch Set 2:

(7 comments)

Nice work. One thing about this that makes me nervous in general is that it's metadata-only logic and doesn't do any verification of existence or validity of the data files. I wonder if there is something we should do along those lines for data protection reasons.

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/BackupIO.scala@107
PS2, Line 107:     fs.delete(backupPath(metadata.getTableId, metadata.getTableName, metadata.getToMs), true)
What is our crash consistency story here?


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala
File java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@90
PS2, Line 90:     // 1. The table name matches the provided names (does not apply if no names were specified).
This will match a table that had the specified name at any time in its history, not just the latest name it had. I wonder if we should restrict it to only the latest name a table with a given id had? Either way, we should doc the precise behavior.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@92
PS2, Line 92:     // 3. The backup is not on the current restore path.
Can we add a TODO here for dropped tables? Maybe something like:

TODO(KUDU-2827): Consider dropped tables eligible for deletion once they reach a certain age.


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/main/scala/org/apache/kudu/backup/KuduBackupCleaner.scala@102
PS2, Line 102: toInstant
nit: lastBackupInstant might be a better name


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala
File java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala:

http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@51
PS2, Line 51: epochMs
nit: rename to something like epochMillisBefore()


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@62
PS2, Line 62: ts
nit: rename ts to ages maybe?


http://gerrit.cloudera.org:8080/#/c/13405/2/java/kudu-backup-tools/src/test/scala/org/apache/kudu/backup/TestKuduBackupCleaner.scala@118
PS2, Line 118:     // Finally, add a third path which is not the restore path but which has backups that are old
Can we add a test case where there are multiple toMs / fromMs alignments? It would be nice to prune an old "twin" daughter path but I'm not sure what do to about multiple "grandparent" paths. I guess we could / should arbitrarily kill one of the ancestor paths?



-- 
To view, visit http://gerrit.cloudera.org:8080/13405
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4c4dde10ec3c3c9b2233ee2f7b13c46e963e39
Gerrit-Change-Number: 13405
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Sat, 25 May 2019 01:05:02 +0000
Gerrit-HasComments: Yes