You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/12/15 02:51:41 UTC

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Hello Adar Dembo, Bankim Bhavsar, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/14910

to review the following change.


Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................

KUDU-3002: prioritize WAL unanchoring when under memory pressure

When under memory pressure, we currently prioritize performing the op
that will free the most memory. In theory this seems reasonable -- if
Kudu's using way too much memory, we should try to use less memory as
quickly as possible. In practice, this meant that, while under memory
pressure and under a constant, memory-consuming workload (e.g. inserts
to the MRS), Kudu would starve operations that anchor relatively little
memory (e.g. DMS flushes).

This patch updates the behavior so that we prioritize operations that
unanchor the most WAL bytes, breaking ties by prioritizing the ops that
use more memory. This seems reasonable because:
- Ops that anchor WALs also anchor memory. In performing an op that
  unanchor WALs, we are performing an op that frees memory, so we'll
  still prefer MRS and DMS flushing over compactions when under memory
  pressure.
- We already use this heuristic when _not_ under memory pressure, but it
  is only used when the ops under consideration anchor too many WAL
  bytes (per --log_target_replay_size_mb), lending some credibility to
  it when used conditionally.
- It becomes much more difficult to think of a scenario in which we're
  "stuck" using too much space for WALs or too much memory.

Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
---
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
3 files changed, 54 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/14910/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Bankim Bhavsar, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14910

to look at the new patch set (#2).

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................

KUDU-3002: prioritize WAL unanchoring when under memory pressure

When under memory pressure, we currently prioritize performing the op
that will free the most memory. In theory this seems reasonable -- if
Kudu's using way too much memory, we should try to use less memory as
quickly as possible. In practice, this meant that, while under memory
pressure and under a constant, memory-consuming workload (e.g. inserts
to the MRS), Kudu would starve operations that anchor relatively little
memory (e.g. DMS flushes).

This patch updates the behavior so that we prioritize operations that
unanchor the most WAL bytes, breaking ties by prioritizing the ops that
use more memory. This seems reasonable because:
- Ops that anchor WALs also anchor memory. In performing an op that
  unanchor WALs, we are performing an op that frees memory, so we'll
  still prefer MRS and DMS flushing over compactions when under memory
  pressure.
- We already use this heuristic when _not_ under memory pressure, but it
  is only used when the ops under consideration anchor too many WAL
  bytes (per --log_target_replay_size_mb), lending some credibility to
  it when used conditionally.
- It becomes much more difficult to think of a scenario in which we're
  "stuck" using too much space for WALs or too much memory.

Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
---
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
4 files changed, 102 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/14910/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Bankim Bhavsar, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14910

to look at the new patch set (#3).

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................

KUDU-3002: prioritize WAL unanchoring when under memory pressure

When under memory pressure, we currently prioritize performing the op
that will free the most memory. In theory this seems reasonable -- if
Kudu's using way too much memory, we should try to use less memory as
quickly as possible. In practice, this meant that, while under memory
pressure and under a constant, memory-consuming workload (e.g. inserts
to the MRS), Kudu would starve operations that anchor relatively little
memory (e.g. DMS flushes).

This patch updates the behavior so that we prioritize operations that
unanchor the most WAL bytes, breaking ties by prioritizing the ops that
use more memory. This seems reasonable because:
- Ops that anchor WALs also anchor memory. In performing an op that
  unanchor WALs, we are performing an op that frees memory, so we'll
  still prefer MRS and DMS flushing over compactions when under memory
  pressure.
- We already use this heuristic when _not_ under memory pressure, but it
  is only used when the ops under consideration anchor too many WAL
  bytes (per --log_target_replay_size_mb), lending some credibility to
  it when used conditionally.
- It becomes much more difficult to think of a scenario in which we're
  "stuck" using too much space for WALs or too much memory.

Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
---
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
4 files changed, 117 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/14910/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................


Patch Set 2:

Forgot to mention: could you sanity check this on a real cluster?


-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 16 Dec 2019 21:28:14 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 17 Dec 2019 19:49:40 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................


Patch Set 3:

> Patch Set 2:
> 
> Forgot to mention: could you sanity check this on a real cluster?

Tested on a real cluster with a 1% memory pressure limit and 1GiB hard limit, with a constant insert loadgen workload and manual upserts.

https://imgur.com/MI4BG6Q

I eventually saw DMS flushes a few seconds after upserting (i.e. the DMS flushes are not being triggered immediately, just when they become "old" enough to unanchor WALs).


-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 17 Dec 2019 08:58:44 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14910/2/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14910/2/src/kudu/util/maintenance_manager.cc@429
PS2, Line 429: ignore the target replay size and flush whichever op
             :   // anchors the most WALs (the op should also free memory).
> Could you elaborate on why this is better than flushing whatever anchors th
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 17 Dec 2019 08:35:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................

KUDU-3002: prioritize WAL unanchoring when under memory pressure

When under memory pressure, we currently prioritize performing the op
that will free the most memory. In theory this seems reasonable -- if
Kudu's using way too much memory, we should try to use less memory as
quickly as possible. In practice, this meant that, while under memory
pressure and under a constant, memory-consuming workload (e.g. inserts
to the MRS), Kudu would starve operations that anchor relatively little
memory (e.g. DMS flushes).

This patch updates the behavior so that we prioritize operations that
unanchor the most WAL bytes, breaking ties by prioritizing the ops that
use more memory. This seems reasonable because:
- Ops that anchor WALs also anchor memory. In performing an op that
  unanchor WALs, we are performing an op that frees memory, so we'll
  still prefer MRS and DMS flushing over compactions when under memory
  pressure.
- We already use this heuristic when _not_ under memory pressure, but it
  is only used when the ops under consideration anchor too many WAL
  bytes (per --log_target_replay_size_mb), lending some credibility to
  it when used conditionally.
- It becomes much more difficult to think of a scenario in which we're
  "stuck" using too much space for WALs or too much memory.

Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Reviewed-on: http://gerrit.cloudera.org:8080/14910
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
4 files changed, 117 insertions(+), 18 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-3002: prioritize WAL unanchoring when under memory pressure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14910 )

Change subject: KUDU-3002: prioritize WAL unanchoring when under memory pressure
......................................................................


Patch Set 2:

(1 comment)

TSAN failure looks test-only but legit.

http://gerrit.cloudera.org:8080/#/c/14910/2/src/kudu/util/maintenance_manager.cc
File src/kudu/util/maintenance_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14910/2/src/kudu/util/maintenance_manager.cc@429
PS2, Line 429: ignore the target replay size and flush whichever op
             :   // anchors the most WALs (the op should also free memory).
Could you elaborate on why this is better than flushing whatever anchors the most memory?



-- 
To view, visit http://gerrit.cloudera.org:8080/14910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibd85e8f2904a36b74cd6a3038c9ec49bb1ff9844
Gerrit-Change-Number: 14910
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 16 Dec 2019 21:28:03 +0000
Gerrit-HasComments: Yes