You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (JIRA)" <ji...@apache.org> on 2017/04/04 17:37:42 UTC

[jira] [Resolved] (KUDU-1755) Improve tablet disk space estimation

     [ https://issues.apache.org/jira/browse/KUDU-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-1755.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.3.0

> Improve tablet disk space estimation
> ------------------------------------
>
>                 Key: KUDU-1755
>                 URL: https://issues.apache.org/jira/browse/KUDU-1755
>             Project: Kudu
>          Issue Type: Bug
>          Components: supportability, tablet
>    Affects Versions: 1.1.0
>            Reporter: Adar Dembo
>            Assignee: Grant Henke
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> (Prompted by [this user post|http://mail-archives.apache.org/mod_mbox/kudu-user/201611.mbox/%3Ctencent_201BBF963FB5CB2D7AF99E25%40qq.com%3E])
> The on-disk size of tablets as reported by the Kudu web UI omits some minor as well as some major sources of space consumption. I'm listing them all here for posterity.
> # Bloom file and composite index file usage. According to [this gerrit|https://gerrit.sjc.cloudera.com/#/c/6070/] (warning: internal link), it's because we also use the rowset estimate to determine how much IO will be generated were we to compact that rowset, and bloom/composite index files aren't touched in compaction.
> # UNDO file usage. This seems like a more glaring omission, especially for mutation-heavy workloads like the one reported in the mailing list. But, the current REDO-only estimate factors into major delta compaction decision making by the maintenance manager, so maybe there's a good reason there too.
> # Log block manager block size rounding. The LBM rounds up Kudu blocks to the nearest filesystem block size to improve hole punching space reclamation. A side effect is that some space is lost to external fragmentation.
> # Log block manager metadata overhead. Every container has a .metadata file, and we don't factor that into space utilization.
> # Other files, such as the tablet superblock, WAL segments, and cmeta.
> I expect the first two items to be the largest, so we should work on addressing them. Lets decouple the UI-based estimate from the MM path so our reporting can be more accurate while still allowing the MM to make good decisions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)