You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2017/05/18 01:13:04 UTC

[jira] [Commented] (KUDU-1853) Error during tablet copy may orphan a bunch of stuff

    [ https://issues.apache.org/jira/browse/KUDU-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015029#comment-16015029 ] 

Todd Lipcon commented on KUDU-1853:
-----------------------------------

[~mpercy] can we re-close this now for 1.4?

> Error during tablet copy may orphan a bunch of stuff
> ----------------------------------------------------
>
>                 Key: KUDU-1853
>                 URL: https://issues.apache.org/jira/browse/KUDU-1853
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet, tserver
>    Affects Versions: 1.2.0
>            Reporter: Adar Dembo
>            Assignee: Mike Percy
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> Currently, a failure during tablet copy may leave behind a number of different things:
> # Downloaded superblock (if the failure falls after TabletCopyClient::Start())
> # Downloaded data blocks (if the failure falls during TabletCopyClient::FetchAll())
> # Downloaded WAL segments (if the failure falls during TabletCopyClient::FetchAll())
> # Downloaded cmeta file (if the failure falls during TabletCopyClient::Finish())
> The next time the tserver starts, it'll see that this tablet's state is still TABLET_DATA_COPYING and will tombstone it. That takes care of #1, #3, and #4 (well, it leaves the cmeta file behind as the tombstone, but that's intentional).
> Unfortunately, all data blocks are orphaned, because the on-disk superblock has no record of the new blocks, and so they aren't deleted.
> We're already tracking a general purpose GC mechanism for data blocks in KUDU-829, but I think this separate JIRA for describing the problem with tablet copy is useful, if only as a reference for users.
> Separately, it may be worth addressing these issues for failures that don't result in tserver crashes, such as intermittent network outages between tservers. A long lived tserver won't GC for some time, and it'd be nice to reclaim the disk space used by these orphaned objects in the interim, not to mention that implementing this kind of "GC" for data blocks is a lot easier than a general purpose GC.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)