You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Andrew Wong (Jira)" <ji...@apache.org> on 2019/08/27 22:47:00 UTC

[jira] [Resolved] (KUDU-2907) Add directories to a directory group when there is no space left in a given directory group, but there are directories available

     [ https://issues.apache.org/jira/browse/KUDU-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Wong resolved KUDU-2907.
-------------------------------
    Fix Version/s: 1.11.0
       Resolution: Fixed

Fixed this as [ae80d16|https://github.com/apache/kudu/commit/ae80d16037a5df14b4d35a76655564dae7dffe24]

> Add directories to a directory group when there is no space left in a given directory group, but there are directories available
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2907
>                 URL: https://issues.apache.org/jira/browse/KUDU-2907
>             Project: Kudu
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Andrew Wong
>            Priority: Major
>             Fix For: 1.11.0
>
>
> We've seen an issue wherein a tablet server crashed because of disk space issues. The thing is, the tablet server itself had space, but there were a number of disks that were full.
> {code:java}
> W0726 10:50:58.608566 41367 tablet_replica_mm_ops.cc:144] T d29679efebf94ccb9ed8de7daa44f3ef P 649f3f936e204410a62156f322ac6f90: failed to flush MRS: IO error: Failed to open DiskRowSet for flush: Unable to open output file for column cluster_id[string NOT NULL]: No directories available to add to d29679efebf94ccb9ed8de7daa44f3ef's directory group (11 dirs total, 4 full, 0 failed). (error 28)
> F0726 10:50:58.608582 41367 tablet_replica_mm_ops.cc:145] Check failed: tablet->HasBeenStopped() FlushMRS failure is only allowed if the tablet is stopped first{code}
> Note that the error message is a red herring: the failure really came from selecting a directory to place a container, not from selecting a directory to the directory group.
> There were 4 full disks; presumably the tablet had a default directory group size of 3, and all of its directories were full.
> It would be nice for directory groups to be dynamically resized as needed. If getting a directory for block placement yields an ENOSPC, we should consider adding a directory to the directory group based on available space or based on the number of replicas in the remaining directories.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)