You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/03/26 16:36:00 UTC

[jira] [Commented] (KUDU-2372) Don't let kudu start up if any disks are mounted read-only

    [ https://issues.apache.org/jira/browse/KUDU-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414100#comment-16414100 ] 

Todd Lipcon commented on KUDU-2372:
-----------------------------------

Per KUDU-2359 I think it may make sense to allow starting up with a bad disk so that we don't need manual intervention after a single disk failure (eg on a 12-disk host)

> Don't let kudu start up if any disks are mounted read-only
> ----------------------------------------------------------
>
>                 Key: KUDU-2372
>                 URL: https://issues.apache.org/jira/browse/KUDU-2372
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Andrew Wong
>            Priority: Major
>
> Today, if a Kudu tserver runs into EROFS (read-only mount error), it treats the error as it would a complete disk failure (EIO), allowing successful startup of the server, but failing the tablets that are configured to use the "failed" disk.
> If something is wrong with the mounting of a disk, it might be helpful to bring immediate attention to it, and have operators deal with it, rather than handling it automatically. As such, it might be helpful to prevent Kudu from starting up if errors are detected with the mount configurations.
> There are tradeoffs here to be considered:
>  * The current behavior, as it is today, will evict and delete the data from the failed tablets, as it is treated as an unrecoverable failure. The user can ignore such failures and handle it at their leisure, since Kudu will re-replicate the tablets lost in this way
>  * If we were to instead crash, this gives operators some immediate feedback and a time limit to use `kudu fs update_dirs` to remove the read only drive, or maybe fix the mountpoint itself



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)