You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/12/06 22:10:15 UTC

[kudu-CR] KUDU-2618 Factor the amount of data into time-based flush decisions

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12048


Change subject: KUDU-2618 Factor the amount of data into time-based flush decisions
......................................................................

KUDU-2618 Factor the amount of data into time-based flush decisions

Previously, the perf improvement score for flush operations was
determined first by assigning a large score if the amount of RAM
anchored was over a threshold, and second by assigning a modest score
based on how long since the last flush. There was a small problem with
this, which was that if the rate of inserts was very slow, and the
cluster was generally not very busy writing, even the modest scores for
flushes were enough to cause fairly frequent (~2min) flushes of
replicas. Over time, this would add up to KUDU-1400-style problems with
too many DRS. The changes to fix KUDU-1400 address those problems, but
at the cost of write amplification based in part on the number of DRS
that need to be compacted. So, there's still some gains to be had from
increasing the average sizes of flushes. This patch adjusts how the
score increases with time: flush candidates with RAM anchored below the
target rowset size must wait for half of the time it takes to get the
maximum score before they get a positive score. So if data is coming in
very slowly Kudu will wait longer before flushing, flushing less
overall, but only for smaller amounts of data where the total RAM held
across many tablet replicas shouldn't be too large
(memory-pressure-bassed flushing would kick in then) and the amount of
data held in WALs shouldn't be very big.

Change-Id: I27fbd3ec34c5615d07deaf47b3400ca7c4ea426a
---
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
2 files changed, 63 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/12048/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I27fbd3ec34c5615d07deaf47b3400ca7c4ea426a
Gerrit-Change-Number: 12048
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2618 Factor the amount of data into time-based flush decisions

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.

Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12048 )

Change subject: KUDU-2618 Factor the amount of data into time-based flush decisions
......................................................................

Patch Set 1:

(10 comments)

Did you test this on a real cluster?

http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@20
PS1, Line 20: This
Nit: consider starting a new paragraph here.

http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@24
PS1, Line 24: Kudu will wait longer before flushing
Note that some integration tests (such as node_density-itest) set flush_threshold_secs to 1 in order to force the tserver to flush as quickly as possible. Such tests are probably flushing far less now.

To maintain the existing behavior, you should look at all tests that override flush_threshold_secs and flush_threshold_mb, and probably override flush_threshold_upper_bound_secs too.

http://gerrit.cloudera.org:8080/#/c/12048/1//COMMIT_MSG@27
PS1, Line 27: (memory-pressure-bassed flushing would kick in then) and the amount of
based

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@75
PS1, Line 75: DEFINE_int32(flush_threshold_upper_bound_secs, 60 * 60,
Should probably tagged experimental, or at least advanced.

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@76
PS1, Line 76: MemRowSet
Not just MRS.

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@102
PS1, Line 102: if (anchored_mb > FLAGS_flush_threshold_mb) {
Rereading this code after not looking at it for years, I was somewhat surprised that we convert anchored RAM into a perf improvement score. I went back and reread how the maintenance manager prioritizes operations and summarized it below:
1. The low-IO operation that retains the most WAL bytes, if one exists (LogGC).
2. If we're under memory pressure (60% of the hard limit), the operation that anchors the most RAM (FlushMRS or FlushDMS).
3. The operation that retains the most WAL bytes provided that quantity is at least 1 GB (LogGC).
4. The operation that retains the most data bytes, selected only half of the time if there's an operation with a perf improvement (UndoDeltaGC).
5. The operation with the best perf improvement (various).

I guess the idea is that we still want to flush even when not under memory pressure, and given the above prioritization, the only way to get the MM to flush is to show a perf improvement.

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@106
PS1, Line 106: MRS
Since this function is also used to set perf improvements for DMS flushes, I'd reword the comments here to account for the fact that we could be talking about a DMS, not an MRS.

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@121
PS1, Line 121: ti
time

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@128
PS1, Line 128: target_rowset_size
Nit: target_rowset_size_mb

http://gerrit.cloudera.org:8080/#/c/12048/1/src/kudu/tablet/tablet_replica_mm_ops.cc@133
PS1, Line 133: if (perf > 1.0) {
else if

--
To view, visit http://gerrit.cloudera.org:8080/12048
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27fbd3ec34c5615d07deaf47b3400ca7c4ea426a
Gerrit-Change-Number: 12048
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 06 Dec 2018 22:58:09 +0000
Gerrit-HasComments: Yes