You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2018/10/18 06:45:00 UTC

[jira] [Commented] (KUDU-1400) Improve rowset compaction policy to consider merging small DRSs

    [ https://issues.apache.org/jira/browse/KUDU-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654712#comment-16654712 ] 

Will Berkeley commented on KUDU-1400:
-------------------------------------

Todd implemented a configurable flushing time threshold ({{–flush_threshold_secs}}) in 8d026474be, a long time ago.

I've written a [design doc|https://docs.google.com/document/d/1yTfxt0_2p5EfIjCnjJCt3o-nB9xk-Kl2O8yKTA1LQrQ/edit#heading=h.5z0d0yyd9zfk] for improvements to compaction policy that should also help with this issue.

> Improve rowset compaction policy to consider merging small DRSs
> ---------------------------------------------------------------
>
>                 Key: KUDU-1400
>                 URL: https://issues.apache.org/jira/browse/KUDU-1400
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Binglin Chang
>            Assignee: Will Berkeley
>            Priority: Major
>
> We see some small table with light write load generate lot's of small DRS(~1MB), since those DRSes do not overlap much, they don't get the chance to be compacted, generating lot of very small files/blocks. So:
> # Compaction solution value should consider benefits of merging small DRS
> # Every 2 min flushing MRS(small or large) seems suboptimal, maybe flushing small MRS should have "lower priority" than rowset compaction with higher solution value?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)