You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Anupama Gupta (Code Review)" <ge...@cloudera.org> on 2018/06/04 08:38:05 UTC

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Anupama Gupta has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10591


Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 201 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 1
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 7:

Overall looks good, just some minor comments


-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 07 Jun 2018 01:22:50 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#8).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 143 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 8
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#12).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps after the 
next tablet copy.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 144 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/12
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 12
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 11 Jun 2018 19:28:44 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Reviewed-on: http://gerrit.cloudera.org:8080/10591
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 139 insertions(+), 38 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 15
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#6).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 206 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 6
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#5).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 206 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/consensus/log.cc@997
PS12, Line 997: < "
add kLogPrefix here too


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc
File src/kudu/integration-tests/recover_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@133
PS7, Line 133: 
> Done. Although the test works even for cases when the deleted log segment i
OK, feel free to riff on that, it was just a suggestion of the type of comment that will be useful to people reading the code in the future.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@88
PS12, Line 88: namespace tserver {
There's no need to change the namespace of this test to access stuff in the tserver namespace (just add a using directive for anything you need to import)


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@149
PS12, Line 149: 4
s/4/3/


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@152
PS12, Line 152:   // Start a cluster with 3 tablet servers consisting of 1 tablet with 3 replicas
Please put a period (or appropriate punctuation) at the end of this sentence, and every other comment sentence in this patch.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@154
PS12, Line 154:   // The log segment size and the number of log segments to retain is configured to be very small
This explanation states the obvious for anyone familiar with Kudu looking at the code. Please explain why we are doing that. The reason is so that we will be able to quickly fill up several log segments in order to corrupt them in a way that exercises the necessary code paths for this regression test.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@166
PS12, Line 166:   write_workload.set_payload_bytes(5 * 1024); // Write ops of size 5MB to quickly fill the logs.
5 MB seems excessive when the log segments are only 1MB. However in reality these ops are configured to be 5KB here. Consider making them 512 * 1024 bytes to speed up the test.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@171
PS12, Line 171: 
nit: leave only one blank line, not two


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@227
PS12, Line 227:         WaitUntilTabletRunning(ts_details, tablet_id, MonoDelta::FromSeconds(60)));
Add a comment that this needs to be at least 60 seconds to not be flaky when running in TSAN mode.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@230
PS12, Line 230: Ensures
nit: s/Ensures/Ensure/


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@238
PS12, Line 238: 
nit: remove extra blank line



-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 12
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 18:40:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#14).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 139 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/14
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#10).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 144 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/10
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 10
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#4).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 198 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 4
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#3).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 203 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 3
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#11).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 144 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/11
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 11
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 12:

(23 comments)

Thanks for the useful comments Mike. I have addressed all of them.

http://gerrit.cloudera.org:8080/#/c/10591/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10591/7//COMMIT_MSG@18
PS7, Line 18:  
> after the next tablet copy.
Done


http://gerrit.cloudera.org:8080/#/c/10591/10/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:

http://gerrit.cloudera.org:8080/#/c/10591/10/src/kudu/consensus/log.h@145
PS10, Line 145: * 
> nit: don't make these changes, keep the modifiers next to the type to be co
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h@143
PS7, Line 143: ,
> add: " if it exists"
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h@145
PS7, Line 145: RemoveRecoveryDir
> How about renaming this to RemoveRecoveryDirIfExists() ?
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@996
PS7, Line 996: !fs_
> nit: this syntax is neat, but for consistency with the rest of the Kudu C++
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@997
PS7, Line 997: b
> nit: extra space, also make this a VLOG(1) to avoid spamming the server log
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1002
PS7, Line 1002: con
> All of these logs should also log the prefix "T <tablet id> P <peer uuid>: 
Makes sense. Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1005
PS7, Line 1005: str
> All of these LOG messages used to be VLOG_WITH_PREFIX(1) and I think we sho
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1006
PS7, Line 1006: VLOG(1) << kLogPrefi
> nit: line up your indentation with the previous << here and elsewhere
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc
File src/kudu/integration-tests/recover_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@59
PS7, Line 59: 
> nit: Regression test for KUDU-1038.
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@60
PS7, Line 60: 
> or is not fsynced before a crash
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@61
PS7, Line 61: 
            : 
            : 
> slight rewrite:
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@64
PS7, Line 64: 
            : 
> suggestion:
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@66
PS7, Line 66: 
> Can you move this test to ts_recovery-itest and make this part of TsRecover
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@79
PS7, Line 79: 
> This actually just creates a new tablet.
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@87
PS7, Line 87: 
> nit: missing trailing period per style guide per https://google.github.io/s
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@96
PS7, Line 96: 
            : 
            : 
> you can get rid of this
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@107
PS7, Line 107: 
> remove
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@111
PS7, Line 111: 
> remove this debugging log message
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@117
PS7, Line 117: 
> add a period at the end of your comment sentences, here and elsewhere in th
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@133
PS7, Line 133: 
> This comment describes the next line of code and is basically superfluous, 
Done. Although the test works even for cases when the deleted log segment is at index 1.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@141
PS7, Line 141: 
> remove this comment
Done


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@144
PS7, Line 144: 
> remove this sleep
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 12
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 17:15:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#7).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap)

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 206 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#9).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 143 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 9
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#13).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
5 files changed, 140 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/13
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 13
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Anupama Gupta has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 14:

(10 comments)

Please review the changes.

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/consensus/log.cc@997
PS12, Line 997: < k
> add kLogPrefix here too
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc
File src/kudu/integration-tests/ts_recovery-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@88
PS12, Line 88: 
> There's no need to change the namespace of this test to access stuff in the
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@149
PS12, Line 149: A
> s/4/3/
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@152
PS12, Line 152:   vector<string> flags;
> Please put a period (or appropriate punctuation) at the end of this sentenc
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@154
PS12, Line 154:   // that is done to be able to quickly fill up the log segments in order to corrupt them
> This explanation states the obvious for anyone familiar with Kudu looking a
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@166
PS12, Line 166:   TestWorkload write_workload(cluster_.get());
> 5 MB seems excessive when the log segments are only 1MB. However in reality
Sorry for the typo in the comment. I changed this to 32 KB to speed up the test as the maximum allowable value in this case is around 65Kb.


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@171
PS12, Line 171: 
> nit: leave only one blank line, not two
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@227
PS12, Line 227:   }
> Add a comment that this needs to be at least 60 seconds to not be flaky whe
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@230
PS12, Line 230: sensusS
> nit: s/Ensures/Ensure/
Done


http://gerrit.cloudera.org:8080/#/c/10591/12/src/kudu/integration-tests/ts_recovery-itest.cc@238
PS12, Line 238: // that are referenced by a tablet will not be reused.
> nit: remove extra blank line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 14
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 09 Jun 2018 04:35:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 10:

(1 comment)

Not sure if this is ready for re-review, I just took a quick look and it seems the review feedback has been addressed. If so please respond to the review feedback on a per-item basis. Thanks!

http://gerrit.cloudera.org:8080/#/c/10591/10/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:

http://gerrit.cloudera.org:8080/#/c/10591/10/src/kudu/consensus/log.h@145
PS10, Line 145:  *
nit: don't make these changes, keep the modifiers next to the type to be consistent w/ the rest of the Kudu code base



-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 10
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Fri, 08 Jun 2018 07:02:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory,if any

Posted by "Anupama Gupta (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10591

to look at the new patch set (#2).

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory,if any
......................................................................

KUDU-1038 Deleting a tablet should also delete its log recovery directory,if any

This fix ensures successful bootstrap of a failed tablet replica, in the case
when its WAL segments are missing/deleted. The following scenario is tested:
1. A log segment is deleted from one of the Tablet Replicas.

2. On server restart, the replica in step 1. enters a failed state.
( Without this fix, due to the presence of log recovery directory (with deleted
log segments), the replica remains in a FAILED state and fails to bootstrap )

3. This fix deletes the log recovery directory when the tablet data is deleted,
as a result of which the failed replica successfully bootstraps.

Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
---
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/recover_tablet-itest.cc
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 201 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/10591/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 2
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10591 )

Change subject: KUDU-1038 Deleting a tablet should also delete its log recovery directory, if any
......................................................................


Patch Set 7:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/10591/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10591/7//COMMIT_MSG@18
PS7, Line 18: .
after the next tablet copy.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h@143
PS7, Line 143: .
add: " if it exists"


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.h@145
PS7, Line 145: RemoveRecoveryDir
How about renaming this to RemoveRecoveryDirIfExists() ?


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@996
PS7, Line 996: not 
nit: this syntax is neat, but for consistency with the rest of the Kudu C++ code base please use "!"


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@997
PS7, Line 997:  
nit: extra space, also make this a VLOG(1) to avoid spamming the server logs


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1002
PS7, Line 1002: LOG
All of these logs should also log the prefix "T <tablet id> P <peer uuid>: " with them, which is what LOG_WITH_PREFIX() normally does with non-static methods, so you can calculate that once at the top like:

  const auto kLogPrefix = Substitute("T $0 P $1: ", tablet_id, fs_manager->permanent_uuid());

Then LOG or VLOG like:

  LOG(INFO) << kLogPrefix << "some message";


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1005
PS7, Line 1005: LOG
All of these LOG messages used to be VLOG_WITH_PREFIX(1) and I think we should put them back to be VLOG(1) with kLogPrefix in there.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/consensus/log.cc@1006
PS7, Line 1006:                     
nit: line up your indentation with the previous << here and elsewhere


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc
File src/kudu/integration-tests/recover_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@43
PS7, Line 43: 
nit: remove multiple extra lines (one empty line is good)


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@59
PS7, Line 59: KUDU-1038
nit: Regression test for KUDU-1038.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@60
PS7, Line 60: .
or is not fsynced before a crash


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@61
PS7, Line 61: On server restart, the replica with the deleted log segment enters a failed state.
            : // (In the presence of the WAL recovery dir (with deleted log segments),
            : // the replica enters a FAILED state and fails to bootstrap on subsequent attempts)
slight rewrite:

On server restart, the replica with the deleted log segment enters a FAILED state after being unable to complete replay of the WAL and leaves the WAL recovery directory in place.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@64
PS7, Line 64: 4. This test shows that the failed replica successfully bootstraps if the recovery dir
            : // is deleted(if it exists) when we delete (tombstone) the tablet data
suggestion:

The master should tombstone the FAILED replica, causing its recovery directory to be deleted. A subsequent tablet copy and tablet bootstrap should cause the replica to become healthy again.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@66
PS7, Line 66: TEST_F(TabletRecoveryITest, TestTabletRecoveryAfterSegmentDelete) {
Can you move this test to ts_recovery-itest and make this part of TsRecoveryITest?


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@79
PS7, Line 79: Write data to the tablet.
This actually just creates a new tablet.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@87
PS7, Line 87: id
nit: missing trailing period per style guide per https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@96
PS7, Line 96:   while (write_workload.rows_inserted() < 100) {
            :     SleepFor(MonoDelta::FromMilliseconds(10));
            :   }
you can get rid of this


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@107
PS7, Line 107:   LOG(INFO) << "Stopping workload...";
remove


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@111
PS7, Line 111:   LOG(INFO) << "Getting cstate...";
remove this debugging log message


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@117
PS7, Line 117:   // Shutdown the cluster
add a period at the end of your comment sentences, here and elsewhere in this file


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@133
PS7, Line 133:     // Delete WAL segment at index 2
This comment describes the next line of code and is basically superfluous, consider saying why we are doing this, i.e.:

  // Delete a WAL segment in the middle of other log segments so at tablet startup time we will detect out-of-order WAL segments during log replay and fail to bootstrap.


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@141
PS7, Line 141:   //Restart the server
remove this comment


http://gerrit.cloudera.org:8080/#/c/10591/7/src/kudu/integration-tests/recover_tablet-itest.cc@144
PS7, Line 144:   SleepFor(MonoDelta::FromSeconds(3));
remove this sleep



-- 
To view, visit http://gerrit.cloudera.org:8080/10591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If3f017caf41ac4e00bf89421d6e73924b9fdcbd0
Gerrit-Change-Number: 10591
Gerrit-PatchSet: 7
Gerrit-Owner: Anupama Gupta <an...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 07 Jun 2018 01:22:33 +0000
Gerrit-HasComments: Yes