You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/06/04 03:16:40 UTC

[kudu-CR] consensus: consolidate Raft thread pools

Adar Dembo has posted comments on this change.

Change subject: consensus: consolidate Raft thread pools
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6946/6//COMMIT_MSG
Commit Message:

PS6, Line 8: 
           : This patch consolidates the two thread pools used in Raft consensus: the
           : observer and "Raft" (for lack of a better name) pools. The former runs tasks
           : for infrequent events such as term and config changes while the latter is
           : used for periodic events such as processing heartbeat responses.
           : 
           : In this patch, each per-replica pool is consolidated into a single
           : per-server pool. Sequence tokens are used to ensure that tasks belonging to
           : a single replica are executed serially and in order. For the "Raft" pool, a
           : single sequence token is shared by the PeerManager and RaftConsensus; this
           : mirrors the existing sharing behavior and is safe because token operations
           : are thread-safe.
           : 
           : The non-default max_threads may come as a surprise, but David showed me how
           : tasks submitted to either pool may sometimes lead to an fsync (mostly via
           : cmeta flushes). As such, it isn't safe to use the default num_cpus upper
           : bound, as that may cause such IO-based tasks to block other CPU-only tasks
           : (or worse, periodic tasks like heartbeat response processing).
           : 
           : What per-replica threads are not addressed by this patch?
           : - Failure detection thread: a threadpool isn't the right model
           :   for these. Something like a hash wheel timer would be ideal.
           : - Prepare thread pool (max_threads=1): these could be consolidated too, but
           :   their metrics are per-replica, and sequence tokens don't (yet) support
           :   that. Meaning, if they were consolidated now, the metrics would also
           :   consolidate and would be less useful as a result.
> would be good to have a few runs of raft_consensus-itest on dist-test with 
I made a bunch of changes to the threadpool slot implementation and to this consolidation patch. I ran raft_consensus-itest on dist-test in DEBUG and slow mode.

Out of 1000 runs, only 1 failed:
  F0603 02:23:28.156342 15190 test_workload.cc:220] Check failed: row_count >= expected_row_count (7382 vs. 7384)

The full log is available here: https://filebin.ca/3OloKjkUOdaK


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

PS6, Line 114: ThreadPool* raft_observers_pool
> noticed that, for consensus peers, you're passing the token whereas here yo
Done


PS6, Line 917: raft_observers_pool_token_.Wait()
> shutdown on a tp prevents more ops from being submitted while this doesn't,
Yes, this was the race we discussed last week.

I've since added a proper token shutdown routine and used it here.


http://gerrit.cloudera.org:8080/#/c/6946/2/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

PS2, Line 434:   // The pool token which executes observe
> remove? also maybe add some info about the experiments you made where you f
I've removed the TODO.

I don't think it's safe to conclude that running notifications in parallel is unsafe, at least not based on my dist-test runs. All of the failures were row count related, and I the runs took place before you merged your fix for the semi-obscure bug that was producing bad row counts.


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS6, Line 192: raft_pool
> on the peer manager side you call this request_* something. mind using the 
Done


PS6, Line 1743: raft_pool_token_->Wait();
> again, theres a change in semantics from "close the tp and wait for the ong
As we discussed last week, this particular semantic change wasn't dangerous because the "last task in a slot may destroy its token" exception I added to ThreadPool ensured that when RaftConsensus was destroyed, it was via its last task. RaftConsensus could still submit useless tasks to the token after Shutdown(), but they'd no-op and eventually lead to the object's destruction.

In any case this is now moot because I added proper token shutdown behavior, so token-based submission should now have identical semantics.


http://gerrit.cloudera.org:8080/#/c/6946/6/src/kudu/server/server_base.cc
File src/kudu/server/server_base.cc:

PS6, Line 255: queue-observers-pool
> use "raft_observers_pool" or something
Now moot: I further consolidated the two pools into one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8958c607d11ac2e94908670c985059bef0ff5f54
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes