You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/06/04 03:16:40 UTC

[kudu-CR] consensus: consolidate Raft thread pools

Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
......................................................................

Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6946/6//COMMIT_MSG
Commit Message:

PS6, Line 8:
: This patch consolidates the two thread pools used in Raft consensus: the
: observer and "Raft" (for lack of a better name) pools. The former runs tasks
: for infrequent events such as term and config changes while the latter is
: used for periodic events such as processing heartbeat responses.
:
: In this patch, each per-replica pool is consolidated into a single
: per-server pool. Sequence tokens are used to ensure that tasks belonging to
: a single replica are executed serially and in order. For the "Raft" pool, a
: single sequence token is shared by the PeerManager and RaftConsensus; this
: mirrors the existing sharing behavior and is safe because token operations
: are thread-safe.
:
: The non-default max_threads may come as a surprise, but David showed me how
: tasks submitted to either pool may sometimes lead to an fsync (mostly via
: cmeta flushes). As such, it isn't safe to use the default num_cpus upper
: bound, as that may cause such IO-based tasks to block other CPU-only tasks
: (or worse, periodic tasks like heartbeat response processing).
:
: What per-replica threads are not addressed by this patch?
: - Failure detection thread: a threadpool isn't the right model
: for these. Something like a hash wheel timer would be ideal.
: - Prepare thread pool (max_threads=1): these could be consolidated too, but
: their metrics are per-replica, and sequence tokens don't (yet) support
: that. Meaning, if they were consolidated now, the metrics would also
: consolidate and would be less useful as a result.
> would be good to have a few runs of raft_consensus-itest on dist-test with
I made a bunch of changes to the threadpool slot implementation and to this consolidation patch. I ran raft_consensus-itest on dist-test in DEBUG and slow mode.

Out of 1000 runs, only 1 failed:
F0603 02:23:28.156342 15190 test_workload.cc:220] Check failed: row_count >= expected_row_count (7382 vs. 7384)

The full log is available here: https://filebin.ca/3OloKjkUOdaK

http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS6, Line 114: ThreadPool* raft_observers_pool
> noticed that, for consensus peers, you're passing the token whereas here yo
Done

PS6, Line 917: raft_observers_pool_token_.Wait()
> shutdown on a tp prevents more ops from being submitted while this doesn't,
Yes, this was the race we discussed last week.

I've since added a proper token shutdown routine and used it here.

http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS2, Line 434: // The pool token which executes observe
> remove? also maybe add some info about the experiments you made where you f
I've removed the TODO.

I don't think it's safe to conclude that running notifications in parallel is unsafe, at least not based on my dist-test runs. All of the failures were row count related, and I the runs took place before you merged your fix for the semi-obscure bug that was producing bad row counts.

http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS6, Line 192: raft_pool
> on the peer manager side you call this request_* something. mind using the
Done

PS6, Line 1743: raft_pool_token_->Wait();
> again, theres a change in semantics from "close the tp and wait for the ong
As we discussed last week, this particular semantic change wasn't dangerous because the "last task in a slot may destroy its token" exception I added to ThreadPool ensured that when RaftConsensus was destroyed, it was via its last task. RaftConsensus could still submit useless tasks to the token after Shutdown(), but they'd no-op and eventually lead to the object's destruction.

In any case this is now moot because I added proper token shutdown behavior, so token-based submission should now have identical semantics.

http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

PS6, Line 255: queue-observers-pool
> use "raft_observers_pool" or something
Now moot: I further consolidated the two pools into one.

--
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes