You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/09/02 00:31:20 UTC

[jira] [Commented] (KUDU-1567) Short default for log retention increases write amplification

    [ https://issues.apache.org/jira/browse/KUDU-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457066#comment-15457066 ] 

Todd Lipcon commented on KUDU-1567:
-----------------------------------

Another thought: would be good to change the retention behavior to support the following:

- on an actively written tablet, don't worry about going up to 10-20 log segments. If someone restarts in the middle of a heavy write workload, it's probably more expected for those tablets to recover slowly.
- when the tablet has flushed due to time reasons and no longer needs all of those log segments, we should delete them rather than adhering to some arbitrary "min segments"

In other words, the user configuration should be to set a target size (a soft upper bound) for the logs that need to be replayed, but not a lower bound of logs which are kept for no good reason.

> Short default for log retention increases write amplification
> -------------------------------------------------------------
>
>                 Key: KUDU-1567
>                 URL: https://issues.apache.org/jira/browse/KUDU-1567
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, tserver
>    Affects Versions: 0.10.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Currently the maintenance manager prioritizes flushes over compactions if the flush operations are retaining WAL segments. The goal here is to prevent the amount of in-memory data from getting so large that restarts would be incredibly slow. However, it has a somewhat unintuitive negative effect on performance:
> - with the default of retaining just two segments, flushes become highly prioritized when the MRS only has ~128MB of data, regardless of the "flush_threshold_mb" configuration
> - this creates lots of overlapping rowsets in the case of random-write applications
> - because flushes are prioritized over compactions, compactions rarely run
> - the frequent flushes, combined with low priority of compactions, means that after a few days of constant inserts, we often end up with average "bloom lookups per op" metrics of 50-100, which is quite slow even if the blooms fit in cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)