You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/10/05 01:14:21 UTC

[kudu-CR] KUDU-1567. Decouple hard-minimum WAL segment retention from target

Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4470

to look at the new patch set (#5).

Change subject: KUDU-1567. Decouple hard-minimum WAL segment retention from target
......................................................................

KUDU-1567. Decouple hard-minimum WAL segment retention from target

This changes the behavior around the "minimum log segments to retain".
Previously, the maintenance manager considered it high priority to flush
any in-memory store which was retaining more than this number of log
segments. With the default log_min_segments_to_retain=2, this caused the
maintenance manager to trigger very small flushes (128MB) regardless of
the size of flush_threshold_mb. The end result here was high write
amplification.

Testing with -log_min_segments_to_retain=50 indicated that write
performance could be improved about 2x and write amplification reduced
by about 1.7x by removing this aggressive flush behavior.

However, setting the 'min segments to retain' also had the unfortunate
side effect of _always_ retaining 50 segments, regardless of whether
those were actually necessary for durability purposes. In a long-running
cluster, most tablets are not actively being loaded into at such a high
rate, and retaining 50 segments would mean unnecessary disk usage as
well as longer startup times in the absence of a solution to KUDU-38.

Thus, this patch takes the approach of decoupling the two ideas into two
separate configurations:

1) the original 'log_min_segments_to_retain', which can be left very
   low, and now is really only useful for things like post-mortem
   debugging. A future commit could change this to 1 or possibly even 0.

2) a new 'maintenance_manager_target_log_replay_size_mb' flag, which
   indicates the amount of retained log data at which point the MM
   should schedule flushes of in-memory stores.

With the new defaults, we should have the following behavior:
- an MRS can fill up until the logs reach 1GB. At that point, the MM
  will begin flushing.
- after a flush, the logs will be GCed down to 2 segments.

As follow-on work, we can consider the following ideas:
- log_min_segments_to_retain is no longer determining flushes, so it
  would be safe to set it to 0 (i.e. only retain the in-progress log).
  However, this will likely need a bit more stress testing and will
  require updating various tests.
- along the same lines, we can consider adding functionality to the
  log such that, if a tablet hasn't received writes in a long time
  and the log size is greater than some threshold, it would perform
  an "early roll" and not preallocate the next segment. This could
  save disk space for "inactive" tablets as well as decrease startup
  time.

Change-Id: I31400e2200f9ce3eeb63f4bc948bc630e8c1115f
---
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
M src/kudu/tablet/tablet_peer_mm_ops.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
12 files changed, 177 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/4470/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4470
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I31400e2200f9ce3eeb63f4bc948bc630e8c1115f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>