You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/09/13 01:02:56 UTC
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Hello Mike Percy,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/4392
to review the following change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
This fixes a bug in the way we handle tablet copies while replacing
existing tombstoned tablets:
- a tablet exists in TABLET_DATA_TOMBSTONED state
- we begin copying a new replica on top of this one
-- this calls TabletMetadata::ReplaceSuperBlock() using the remote
superblock (importantly, this remote superblock contains remote block
IDs)
- we crash mid-copy
- on restart, we see the "TABLET_DATA_COPYING" state and "roll forward"
- the deletion of this tablet. However the block IDs here are the IDs from
- the remote machine, and we incorrectly delete a bunch of blocks.
This has always been an issue, but was made worse in 0.10 by the fix for
KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID
matching a local one is quite high, whereas before we'd usually not see
this bug.
The fix here is relatively simple: rather than writing the remote
superblock to disk when starting the copy, we just change the state of
the existing superblock to indicate 'copying'.
Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 165 insertions(+), 65 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/1
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 3:
Build Started http://104.196.14.100/job/kudu-gerrit/3393/
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
This fixes a bug in the way we handle tablet copies while replacing
existing tombstoned tablets:
- a tablet exists in TABLET_DATA_TOMBSTONED state
- we begin copying a new replica on top of this one
-- this calls TabletMetadata::ReplaceSuperBlock() using the remote
superblock (importantly, this remote superblock contains remote block
IDs)
- we crash mid-copy
- on restart, we see the "TABLET_DATA_COPYING" state and "roll forward"
the deletion of this tablet. However the block IDs here are the IDs from
the remote machine, and we incorrectly delete a bunch of blocks.
This has always been an issue, but was made worse in 0.10 by the fix for
KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID
matching a local one is quite high, whereas before we'd usually not see
this bug.
The fix here is relatively simple: rather than writing the remote
superblock to disk when starting the copy, we just change the state of
the existing superblock to indicate 'copying'.
Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Reviewed-on: http://gerrit.cloudera.org:8080/4392
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 165 insertions(+), 65 deletions(-)
Approvals:
Mike Percy: Looks good to me, approved
Todd Lipcon: Verified
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4392
to look at the new patch set (#2).
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
This fixes a bug in the way we handle tablet copies while replacing
existing tombstoned tablets:
- a tablet exists in TABLET_DATA_TOMBSTONED state
- we begin copying a new replica on top of this one
-- this calls TabletMetadata::ReplaceSuperBlock() using the remote
superblock (importantly, this remote superblock contains remote block
IDs)
- we crash mid-copy
- on restart, we see the "TABLET_DATA_COPYING" state and "roll forward"
the deletion of this tablet. However the block IDs here are the IDs from
the remote machine, and we incorrectly delete a bunch of blocks.
This has always been an issue, but was made worse in 0.10 by the fix for
KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID
matching a local one is quite high, whereas before we'd usually not see
this bug.
The fix here is relatively simple: rather than writing the remote
superblock to disk when starting the copy, we just change the state of
the existing superblock to indicate 'copying'.
Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 165 insertions(+), 65 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/2
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4392
to look at the new patch set (#3).
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
This fixes a bug in the way we handle tablet copies while replacing
existing tombstoned tablets:
- a tablet exists in TABLET_DATA_TOMBSTONED state
- we begin copying a new replica on top of this one
-- this calls TabletMetadata::ReplaceSuperBlock() using the remote
superblock (importantly, this remote superblock contains remote block
IDs)
- we crash mid-copy
- on restart, we see the "TABLET_DATA_COPYING" state and "roll forward"
the deletion of this tablet. However the block IDs here are the IDs from
the remote machine, and we incorrectly delete a bunch of blocks.
This has always been an issue, but was made worse in 0.10 by the fix for
KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID
matching a local one is quite high, whereas before we'd usually not see
this bug.
The fix here is relatively simple: rather than writing the remote
superblock to disk when starting the copy, we just change the state of
the existing superblock to indicate 'copying'.
Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
---
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tserver/tablet_copy_client.cc
M src/kudu/tserver/ts_tablet_manager.cc
7 files changed, 165 insertions(+), 65 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/4392/3
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 1:
Build Started http://104.196.14.100/job/kudu-gerrit/3385/
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 3: Verified+1
Known-flaky java test
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 3: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 2:
Build Started http://104.196.14.100/job/kudu-gerrit/3392/
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 2:
(5 comments)
looks good, just a few nits
http://gerrit.cloudera.org:8080/#/c/4392/2/src/kudu/integration-tests/delete_table-test.cc
File src/kudu/integration-tests/delete_table-test.cc:
PS2, Line 451: 0
kTsIndex
PS2, Line 503: 0
kTsIndex
PS2, Line 528: 0
kTsIndex
PS2, Line 546: 0
kTsIndex
http://gerrit.cloudera.org:8080/#/c/4392/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:
Line 892: << "of type " << TabletDataState_Name(data_state);
nit: indentation
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1605. Blocks can be incorrectly deleted if TS crashes mid-copy
......................................................................
Patch Set 3:
took care of all the nits
--
To view, visit http://gerrit.cloudera.org:8080/4392
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica25c5e4e5894ea80e416d9a4ad44dd25e0c6d53
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No