You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/09/13 00:58:20 UTC

[jira] [Created] (KUDU-1605) Blocks can be incorrectly deleted if TS crashes mid-tablet-copy

Todd Lipcon created KUDU-1605:
---------------------------------

             Summary: Blocks can be incorrectly deleted if TS crashes mid-tablet-copy
                 Key: KUDU-1605
                 URL: https://issues.apache.org/jira/browse/KUDU-1605
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 0.10.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Blocker


There's currently a bug in the way we handle tablet copies while replacing existing tombstoned tablets:

- a tablet exists in TABLET_DATA_TOMBSTONED state
- we begin copying a new replica on top of this one
-- this calls TabletMetadata::ReplaceSuperBlock() using the _remote_ superblock (importantly, this remote superblock contains remote block IDs)
- we crash mid-copy
- on restart, we see the "TABLET_DATA_COPYING" state and "roll forward" the deletion of this tablet. However the block IDs here are the IDs from the remote machine, and we incorrectly delete a bunch of blocks.

This has always been an issue, but was made worse in 0.10 by the fix for KUDU-1538. After fixing KUDU-1538, the likelihood of a remote block ID matching a local one is quite high, whereas before we'd _usually_ not see this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)