You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/08/19 05:58:20 UTC
[jira] [Commented] (KUDU-1567) Short default for log retention increases write amplification

    [ https://issues.apache.org/jira/browse/KUDU-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427668#comment-15427668 ] 

Todd Lipcon commented on KUDU-1567:
-----------------------------------

Increasing the log segment retention to 20 instead of the default of 2 increased the size of flushes substantially (thus requiring less compaction) and also increased the frequency of compactions running (thus reducing the blooms-per-op statistic).

The downside of course is that startup time will be longer. However, the most common case where someone cares about startup time is for rolling restart. We could provide a "clean shutdown" mode which (optionally) stops accepting writes, flushes all the memory stores, and then shuts down. This, combined with a fix for KUDU-38, would allow a planned restart to proceed quickly since there would be little in the logs left to replay. Then, unplanned restarts would be comparitively rare and the occasional 5+ minute replay time would be no big deal.

A super-cheap implementation of the above would be to just use a gflag which tells the maintenance manager to prioritize flushes above all else. Before a planned shutdown, we can runtime-set the gflag to 'true' and wait until the in-memory stores are all flushed, then does a normal kill. But, we'd still need KUDU-38 fixed.

> Short default for log retention increases write amplification
> -------------------------------------------------------------
>
>                 Key: KUDU-1567
>                 URL: https://issues.apache.org/jira/browse/KUDU-1567
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, tserver
>    Affects Versions: 0.10.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Currently the maintenance manager prioritizes flushes over compactions if the flush operations are retaining WAL segments. The goal here is to prevent the amount of in-memory data from getting so large that restarts would be incredibly slow. However, it has a somewhat unintuitive negative effect on performance:
> - with the default of retaining just two segments, flushes become highly prioritized when the MRS only has ~128MB of data, regardless of the "flush_threshold_mb" configuration
> - this creates lots of overlapping rowsets in the case of random-write applications
> - because flushes are prioritized over compactions, compactions rarely run
> - the frequent flushes, combined with low priority of compactions, means that after a few days of constant inserts, we often end up with average "bloom lookups per op" metrics of 50-100, which is quite slow even if the blooms fit in cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)