You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/08/31 01:00:35 UTC
[kudu-CR] WIP: tie log retention to consensus watermarks
Hello Mike Percy,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/4177
to review the following change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
WIP: tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. This replaces the old single "must be retained"
watermark that the log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
WIP because this needs some automated tests for the above behavior, and
need to sweep for comments to update, clean-up, etc.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
17 files changed, 193 insertions(+), 225 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/1
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 1:
(3 comments)
partial review. back on it later (may wait a little bit to see if it stops being a WIP)
http://gerrit.cloudera.org:8080/#/c/4177/1//COMMIT_MSG
Commit Message:
Line 60: WIP because this needs some automated tests for the above behavior, and
would also be good to get an idea on what is the impact on startup. our bootstrap times already take a pretty long time, but this might mean they increase by a lot. at the very least this would help prioritize kudu-38
http://gerrit.cloudera.org:8080/#/c/4177/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:
Line 650: rem_segs <= FLAGS_log_max_segments_to_retain) {
idea (possibly unrelated to this patch), instead of a max segments to retain flag should we have a "max_percentage_of_drive_space" or something like that flag?
http://gerrit.cloudera.org:8080/#/c/4177/1/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:
Line 173: log::RetentionIndexes GetLogRetention() override;
rename to GetRetentionIndexes?
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 5:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/4177/5//COMMIT_MSG
Commit Message:
Line 11: durability (the committed index) as well as the watermark necessary to
> not sure about this link between the commit index and durability, at least
k, clarified message.
Line 18: - we always maintain any logs necessary for durability
> could you make this clearer? what do we do here now? calc the min between a
clarified
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:
Line 266: // retain.
> nit no need to wrap
Done
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:
Line 1879: return log::RetentionIndexes(queue_->GetCommittedIndex(), // for durability
> add a note that it's ok not to get these atomically
Done
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
File src/kudu/integration-tests/external_mini_cluster_fs_inspector.h:
Line 70: // on TS 'index'.
> nit no need to wrap
Done
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/tablet/tablet_peer.h
File src/kudu/tablet/tablet_peer.h:
Line 193: // up peers.
> nit: wrapping
Done
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#3).
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
WIP: tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. This replaces the old single "must be retained"
watermark that the log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
WIP because this needs some automated tests for the above behavior
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
19 files changed, 251 insertions(+), 319 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/3
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 1:
(3 comments)
just responding to comments. will work on the TODOs in the patch and add tests.
http://gerrit.cloudera.org:8080/#/c/4177/1//COMMIT_MSG
Commit Message:
Line 60: WIP because this needs some automated tests for the above behavior, and
> would also be good to get an idea on what is the impact on startup. our boo
will add a comment to the commit message about this.
http://gerrit.cloudera.org:8080/#/c/4177/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:
Line 650: rem_segs <= FLAGS_log_max_segments_to_retain) {
> idea (possibly unrelated to this patch), instead of a max segments to retai
hesitant to do it percentage-wise, because it's not always obvious how that plays with shared drive usage. ie is it "20% of the total drive?" or "20% of the space that you want Kudu to be using"? I do agree that size-wise is better than a segment count, and it should probably be global across all tablets, not a per-tablet setting.
If you don't mind, I'll try to tackle this as part of KUDU-1567. I did mark the new 'max_segments_to_retain' as experimental so that we can break it.
http://gerrit.cloudera.org:8080/#/c/4177/1/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:
Line 173: log::RetentionIndexes GetLogRetention() override;
> rename to GetRetentionIndexes?
Done
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 2:
Build Started http://104.196.14.100/job/kudu-gerrit/3262/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 7:
Build Started http://104.196.14.100/job/kudu-gerrit/3307/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 1:
Build Started http://104.196.14.100/job/kudu-gerrit/3156/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 5:
Build Started http://104.196.14.100/job/kudu-gerrit/3265/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#4).
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
WIP: tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. This replaces the old single "must be retained"
watermark that the log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
The above scenario is also tested through modifications to
RaftConsensusITest.TestCatchupAfterOpsEvicted and various log-test test
cases.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
22 files changed, 326 insertions(+), 335 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/4
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 6:
Build Started http://104.196.14.100/job/kudu-gerrit/3286/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#6).
Change subject: Tie log retention to consensus watermarks
......................................................................
Tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. The durability watermark is then further
constrained by the LogAnchorRegistry, as before, to ensure that no entry
corresponding to yet-unflushed data can be GCed.
This new struct containing the "for-durability" and "for-peers"
watermarks replaces the old single "must be retained" watermark that the
log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability (the minimum of
those entries needed by consensus, and those entries needed by the
tablet itself)
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
The above scenario is also tested through modifications to
RaftConsensusITest.TestCatchupAfterOpsEvicted and various log-test test
cases.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
23 files changed, 328 insertions(+), 336 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/6
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 7: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 4:
Build Started http://104.196.14.100/job/kudu-gerrit/3264/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#7).
Change subject: Tie log retention to consensus watermarks
......................................................................
Tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. The durability watermark is then further
constrained by the LogAnchorRegistry, as before, to ensure that no entry
corresponding to yet-unflushed data can be GCed.
This new struct containing the "for-durability" and "for-peers"
watermarks replaces the old single "must be retained" watermark that the
log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability (the minimum of
those entries needed by consensus, and those entries needed by the
tablet itself)
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
The above scenario is also tested through modifications to
RaftConsensusITest.TestCatchupAfterOpsEvicted and various log-test test
cases.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
23 files changed, 328 insertions(+), 337 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/7
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#5).
Change subject: Tie log retention to consensus watermarks
......................................................................
Tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. This replaces the old single "must be retained"
watermark that the log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
The above scenario is also tested through modifications to
RaftConsensusITest.TestCatchupAfterOpsEvicted and various log-test test
cases.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
23 files changed, 327 insertions(+), 336 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/5
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.
Change subject: Tie log retention to consensus watermarks
......................................................................
Tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. The durability watermark is then further
constrained by the LogAnchorRegistry, as before, to ensure that no entry
corresponding to yet-unflushed data can be GCed.
This new struct containing the "for-durability" and "for-peers"
watermarks replaces the old single "must be retained" watermark that the
log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability (the minimum of
those entries needed by consensus, and those entries needed by the
tablet itself)
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
The above scenario is also tested through modifications to
RaftConsensusITest.TestCatchupAfterOpsEvicted and various log-test test
cases.
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Reviewed-on: http://gerrit.cloudera.org:8080/4177
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/consensus/raft_consensus_quorum-test.cc
M src/kudu/integration-tests/delete_table-test.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
23 files changed, 328 insertions(+), 337 deletions(-)
Approvals:
Mike Percy: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] Tie log retention to consensus watermarks
Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 6: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/4177
to look at the new patch set (#2).
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
WIP: tie log retention to consensus watermarks
This changes the calculation of log retention to consult consensus.
Consensus returns a struct which indicates the watermark necessary for
durability (the committed index) as well as the watermark necessary to
catch up other peers. This replaces the old single "must be retained"
watermark that the log GC code used before.
The new struct is passed down into the log, and we use the following
policy:
- we always maintain any logs necessary for durability
- beyond that, we try to retain logs to catch up lagging peers, however
we never maintain more than --log_max_segments_to_retain (a new
configuration)
I removed the old flag --log_min_seconds_to_retain, since its main
purpose was for dealing with lagging peers, and that is now handled by
directly consulting consensus.
The one tricky bit of the policy is that, even though the peer catch-up
figures into log retention, we do _not_ want it to impact the
calculation of flush priority. In other words, even if the user is OK
retaining 10GB of logs to catch up trailing peers, they probably still
want to flush more aggressively than that so they can avoid very long
startup times. So, the peer-based watermark is not used during the
mapping of log anchors to retention amounts.
Note that the above is only relevant once we have implemented KUDU-38:
we currently will replay all of the retained logs even though we are
aggressively flushing to keep the durability-related retention bounded.
In practice, even without KUDU-38, this patch shouldn't have a large
negative effect on restart times. In fact, in many cases it can
_improve_ startup times, because in most steady workloads we don't have
peers that are extremely far behind. Our log retention only increases in
those cases, and only on those tablets which have a lagging follower.
For other tablets, the new retention policies actually serve to reduce
the number of retained segments, so if there are no laggy peers, we'll
start up faster.
Manually tested for now as follows:
- started a three-node cluster (locally), set to roll logs at 1MB
segments, but otherwise default
- started an insert workload against a single-tablet table
- I could see that the three servers were maintaining 2 WAL segments in
their WAL directory.
- I kill -STOPped a random server while continuing to insert. I saw that
the WALs in this tablet server's directory froze as is (obviously),
and the other two kept rolling. However, because of this change, the
other servers started retaining wals starting from the point where I
had stopped the follower.
- If I let the insert workload continue, the live servers kept rolling
up until they had 10 segments (default --log_max_segments_to_retain)
at which point they dropped the oldest log.
- I verified that, during this period while the extra segments were
retained, the servers continued to flush frequently so that their
recovery time would be bounded.
- I also verified that, if I un-paused the follower before the others
had evicted it, it was able to catch up, at which point the other
servers GCed those extra logs they had been retaining.
WIP because this needs some automated tests for the above behavior
Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
---
M src/kudu/consensus/consensus.h
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/consensus_queue-test.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/log-test-base.h
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_reader.cc
M src/kudu/consensus/log_reader.h
M src/kudu/consensus/raft_consensus-test.cc
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/tablet/tablet_peer-test.cc
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/tablet_peer.h
18 files changed, 248 insertions(+), 316 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/4177/2
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 3:
Build Started http://104.196.14.100/job/kudu-gerrit/3263/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: tie log retention to consensus watermarks
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: WIP: tie log retention to consensus watermarks
......................................................................
Patch Set 1:
ok, new patch is largely cleaned up, though still planning on adding a test or two. Worth reviewing, though.
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] Tie log retention to consensus watermarks
Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 5:
(6 comments)
lgtm mostly nits/verbage
http://gerrit.cloudera.org:8080/#/c/4177/5//COMMIT_MSG
Commit Message:
Line 11: durability (the committed index) as well as the watermark necessary to
not sure about this link between the commit index and durability, at least from a tablet perspective we might need more than the commit index for durability right?
Line 18: - we always maintain any logs necessary for durability
could you make this clearer? what do we do here now? calc the min between any MRS/DMS anchors (the "real" durability watermarks) and the commit index?
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/consensus/consensus.h
File src/kudu/consensus/consensus.h:
Line 266: // retain.
nit no need to wrap
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:
Line 1879: return log::RetentionIndexes(queue_->GetCommittedIndex(), // for durability
add a note that it's ok not to get these atomically
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
File src/kudu/integration-tests/external_mini_cluster_fs_inspector.h:
Line 70: // on TS 'index'.
nit no need to wrap
http://gerrit.cloudera.org:8080/#/c/4177/5/src/kudu/tablet/tablet_peer.h
File src/kudu/tablet/tablet_peer.h:
Line 193: // up peers.
nit: wrapping
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 6: Code-Review+1
(3 comments)
A couple very minor nits but they are ignorable. LGTM
http://gerrit.cloudera.org:8080/#/c/4177/6/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:
Line 68:
nit: extra line
Line 665:
nit: extra line
http://gerrit.cloudera.org:8080/#/c/4177/6/src/kudu/consensus/log.h
File src/kudu/consensus/log.h:
Line 304: // Helper method to get the segment sequence to GC based on the provided min_op_idx.
comment needs updating
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] Tie log retention to consensus watermarks
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: Tie log retention to consensus watermarks
......................................................................
Patch Set 5: -Verified
Build Started http://104.196.14.100/job/kudu-gerrit/3268/
--
To view, visit http://gerrit.cloudera.org:8080/4177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfc071270510f3dc3c65f88d615e93c6ffb26b12
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No