You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2016/02/26 12:55:18 UTC

[jira] [Updated] (KUDU-616) Mitigate tablet damage when disks are lost

     [ https://issues.apache.org/jira/browse/KUDU-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Percy updated KUDU-616:
----------------------------
    Parent: KUDU-423

> Mitigate tablet damage when disks are lost
> ------------------------------------------
>
>                 Key: KUDU-616
>                 URL: https://issues.apache.org/jira/browse/KUDU-616
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: M5
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>
> Disk loss is an unfortunate fact of life, and Kudu should provide mechanisms for mitigating disk loss.
> # Make it possible to isolate specific tablets to some subset of the machine's disks, so that if one disk dies it doesn't take out all the tablets with it. This is more complicated than it looks:
> ** We need a concrete way of describing disk groups. It can be per-node, or abstract enough that it makes sense across the entire cluster, or perhaps we aggregate information (e.g. ten machines have 5 disks and the other forty machines have 6 disks).
> ** This mechanism needs to be used for both data blocks and other bits of metadata (master blocks, superblocks, and other random files).
> ** Presumably it needs to be provided when a table is created (or a tablet is split), and it needs to be persisted as part of tablet metadata. It might be sufficient to express it in Kudu configuration (i.e. complex gflags) but since it can be associated to tablet metadata, it's hard to see how this would work.
> # When a disk fails, the server needs to handle it appropriately (mark it as failed, put affected tablets in a failed state, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)