You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/04/05 19:00:55 UTC

[GitHub] [couchdb] chewbranca opened a new pull request #1998: Ioq per shard or user

chewbranca opened a new pull request #1998: Ioq per shard or user
URL: https://github.com/apache/couchdb/pull/1998

# IOQ2 Improvements

The IOQ system provides a way to prioritize a variety of request types such that
standard database operations can be run in parallel to background operations
without negative impact. IOQ also provides some levers to change
prioritization of request types on the fly, so that you can prioritize view
builds or compactions. IOQ1 was a single Erlang server pid that all requests
funneled through, which given the rise in CPU cores and SSD hard drives, quickly
becomes a fundamental bottleneck of CouchDB.

IOQ2 improves the situation by adding parallelism and faster data structures.
You can find more details about IOQ2 in [1]. With the increased parallelism, the
challenge becomes how do we dispatch requests across a pool of processes? The
simple case of round robin or random uniform distribution will do a good job of
distributing the load across the pool of processes, but it does not group
requests together in a beneficial way. This lack of grouping is problematic
because it minimizes the amount of "deduping" we can do of identical requests,
and it also greatly complicates prioritization of requests to an individual
database shard.

Deduping is very important because it results in identical in flight read
requests to only be performed once, and the result is returned to all waiting
readers. This is especially useful for inner b-tree nodes and can have a
substantial impact on improving performance, reducing couch file overload, and
reducing IO operations.

The other big reason to group requests together that are for the same couch
file, is that it allows us to prioritize those requests in concert. When we
dispatch requests to a single couch file pid across a pool of IOQ pids, that
means we can't prioritize those requests as a whole because prioritization
happens at the IOQ2 pid level, not at a global level.

This means that ideally we want to be dispatching all requests for an individual
couch file through a single IOQ2 pid. This is exactly what the `fd_hash`
dispatch strategy does, where it takes a hash of the couch file pid modulo the
number of IOQ2 pids, so that all requests to a particular couch file go through
the same IOQ2 pid. The problem with this approach is that you can have multiple
heavily active couch file pids funneling through the same IOQ2 pid while other
IOQ2 pids are idle. This is problematic because eventually IOQ2 pids will become
bottlenecked (on metrics collection amusingly enough), and also because there is
currently no couch file concurrency control within IOQ2 pids, so heavily active
couch files can starve requests to other couch files.

This PR introduces a new approach to dispatching requests across IOQ2 pids.
Instead of having a fixed pool of IOQ2 pids to dispatch requests across, this PR
introduces an `ioq_opener` process responsible for dynamically spawning IOQ2
pids as appropriate, with configuration options for grouping requests by shard,
user, db, or even request class. The result is a dedicated IOQ2 pid per group so
all of the related requests are funneling through the same IOQ2 pid which allows
for deduping of requests and appropriate prioritization.

# Current Status

This is currently a draft PR and provides a proof of concept implementation of
the IOQ opener logic, and it also demonstrates a few approaches taken over the
course of iterating through this. The work is spread out across two repos, first
in the CouchDB repo [2] and second in the CouchDB IOQ repo [3]. Because the
commits have been structured to illustrate different ideas and approaches, I'll
link out to individual commits rather than comparing the branches as a whole.

## Initial Approach

My initial approach was to hook the IOQ opener logic into the places that open
the .couch file handles. My thought was that by fetching the appropriate IOQ pid
when the database handle was opened, that all clients of that database would be
able to utilize that pid without needing to look it back open. So the approach
was to ensure an IOQ pid was set _prior_ to storing the db reference in
`couch_server` such that all clients getting a db handle out of `couch_server`
also get a handle to the appropriate IOQ pid. This works _ok_ in the simple
cases, but starts to get complicated in a hurry.

First off, view handles don't go through `couch_server`, so similar opener logic
needs to be implemented for views and other places. Second, with the Pluggable
Storage Engines, the use of `couch_file` for the storage engine is an
implementation detail and no longer guaranteed, yet views do not support PSEs so
we have an awkward mix of an indirection layer around `couch_file`'s in primary
database operations and not in views. This is awkward because we need to key the
IOQ pid on the `couch_file` pid so that all IOQ requests to the relevant
`couch_file` go through the same IOQ pid, but the actual `couch_file` pid is
hidden behind the PSE abstraction. This wasn't too hard to work around and I
added a `get_fd_pid` to the PSE implementation to support extracting it, but
given we have to do similar types of things in `couch_server`,
`couch_bt_engine_compactor`, `couch_db_updater`, `couch_mrview_updater`, etc, it
seems like this is not an ideal approach.

You can see a rough version of this approach in the CouchDB repo at [4] and the
corresponding changes in the IOQ repo at [5].

## Lazy Opener Approach

I quickly became dissatisfied with the initial approach, especially with the
dichotomy around `couch_file` interactions with some hidden behind PSE and
others hardcoded to use `couch_file` directly. I also didn't like how spread out
the setting of IOQ pids was and the requirement that all future uses of
`couch_file` pids would need to properly set the pids as well.

I started working on a lazy opener approach where we don't open the IOQ pid
until we're in `ioq:call` and we conclude an appropriate IOQ pid has not yet
been opened. That pid is then set in the pdict of the caller pid so that further
IOQ calls to that fd will utilize the pre-determined IOQ pid. This approach
seems natural, but it has its own set of awkwardness as we're only guaranteed to
have the fd pid when we're in `ioq:call`, which means that we can't easily
determine the database name from the pid alone. I experimented with a hybrid
approach that would do an initial opener logic when the database handle was
opened and store a reference to the IOQ pid keyed off the fd in the `ioq_opener`
with the idea that we can then just lazily look up the appropriate IOQ pid based
on the fd at `ioq:call` time, which worked _ok_ but I was not a fan of sometimes
presetting the IOQ pid and other times not.

My next iteration on this approach was to switch to using the `#ioq_request{}`
record as the lookup value to `ioq_opener` rather than the fd or db name. This
request record _should_ contain the appropriate shard name and IOQ class and
other relevant information we would want to use for dispatching across IOQ pids.
You can see the combination of these two approaches in [6].

The tricky bit here being the part where we *should* have the appropriate shard
name and other information. The problem is that this information is extracted
from the `io_priority` parameter to `ioq:call`, which, if set, contains the
shard name, however there is no fixed requirement that `io_priority` is set,
which means there are a number of places that don't set an appropriate IOQ
priority and just use the default values. There's no reason to ever not want an
appropriate `io_priority` set, so this seems like an opportune time to finally
rectify the lack of `io_priority` values. To do that, I added temporary measures
to explode loudly when an IOQ request was missing an `io_priority` value. You
can see those changes for CouchDB and IOQ in [7] and [8], respectively.

So the next step was to set `io_priority` in all the places that it was not
appropriately set. I accomplished this by adding the explode loudly changes
above and then running the test suite to trigger explosions. I've updated all
of the missing `io_priority` locations with appropriate values in [9]. It's
worth noting that this is not a guaranteed exhaustive list as this is a run time
check and the run time was only exercised as efficiently as the eunit test suite
exercises it. It would surprise me if there are not a few other corner cases
where `io_priority` is not properly set.

Unfortunately, the eunit test suite almost entirely ignores setting
`io_priority` values and relies on not setting the priority being an acceptable
course of action. If we're going to ensure that `io_priority` is set, then we
need to make sure it's set in the test suite as well. The somewhat arduous
commit to set `io_priority` throughout all the test suites is in [10]. I think
we can simplify that with some logic and utilities around setting `io_priority`
in the test suites, but [10] does the initial leg work and sets it manually
everywhere it needs to be set so that the test suite passes. Figuring out what
to do on that front is one of the primary questions to be answered as part of
this work, but I think we would benefit considerably from a more rigorous
approach to `io_priority` so I took the time to make it work initially so we can
get a view of what it entails.

The changes in these two branches result in a proof of concept version of the
`ioq_opener` logic that lazily assigns the IOQ pids as needed, based on the
dimensions extracted for the `#ioq_request{}`, ensures that `io_priority` is
always set, and that the test suite passes.

# IOQ Pid Eviction

If we're going to be dynamically spawning IOQ pids as needed, we need some
mechanism to determine when to clear those processes out. The approach taken
here is basically reference counting clients to those pids. So the `ioq_opener`
process will monitor the `couch_file` pids, view pids, compaction pids, client
pids, etc, and when all the relevant pids using that IOQ pid exit, then the IOQ
pid exits as well.

# What is shared per pid?

One question is how many different client types should funnel through the same IOQ
pid? For instance, when we have a `couch_file` pid for a particular database
shard, and we have corresponding view pids for that database shard, should they
all go through the same IOQ pid? I believe they should, as otherwise we can't
actually prioritize interactive database requests versus compaction/views/etc,
so essentially the IOQ pid should be determined by database shard, not
necessarily the exact file that is currently being interacted with. Although
there are a few different dispatch strategies, one of which being fd pid based
dispatch so you _can_ run an IOQ pid per `couch_file` if so desired. You can see
the dispatch logic in [11].

# Status

This currently works and passes the test suite, although I want to add
`ioq_opener` specific tests to ensure the eviction and dispatch logic is
properly functioning. There's some inconsistencies with shard names flying around
that need to be resolved to ensure all requests goto the appropriate IOQ pid.
The IOQ config settings also need to be updated to allow per IOQ pid config
values. This is a great feature and will allow per shard concurrency levels and
priority values for classes.

Let me know what you think!

# References

[1] https://github.com/apache/couchdb-ioq/blob/master/IOQ2.md
[2] https://github.com/apache/couchdb/tree/ioq-per-shard-or-user
[3] https://github.com/apache/couchdb-ioq/tree/ioq-per-shard-or-user
[4] https://github.com/apache/couchdb/commit/1644b1dfc93106792631a7688ed5bd413dddd03b
[5] https://github.com/apache/couchdb-ioq/commit/1dee640342c4db8756606bff3962cfed6efc8bc1
[6] https://github.com/apache/couchdb-ioq/commit/acd776b23cf13235782de026a8d861b064214f6e
[7] https://github.com/apache/couchdb/commit/3b4cf711a13ac8afd41271118fca69f5c4310edf
[8] https://github.com/apache/couchdb-ioq/commit/c5e91ca9660579c634360fc63422fe2cee237d4a
[9] https://github.com/apache/couchdb/commit/b85c17d569ac6e631a02acc0eecf2bdc6090c67c
[10] https://github.com/apache/couchdb/commit/733b025be28611491de0b478534ce5984a8bd5ce
[11] https://github.com/apache/couchdb-ioq/blob/ioq-per-shard-or-user/src/ioq_opener.erl#L148-L171

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services