You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Anthony F <af...@gmail.com> on 2014/02/07 21:19:51 UTC
tserver side parallelism
How do the config variables tserver.readahead.concurrent.max and
tserver.scan.files.open.max interact with BatchScanner threads requested
from the Connector? I have tserver.readahead.concurrent.max set to 64 and
tserver.scan.files.open.max set to 100. However, unless I bump up the
number of BatchScanner threads, I don't see much tserver side parallelism.
If I bump up the number of BatchScanner threads, then I can see multiple
scans per tserver. What governs the number of tserver side threads used to
execute a scan and what prevents too many threads from spinning up to
service multiple concurrent scans from independent clients?
Re: tserver side parallelism
Posted by Josh Elser <jo...@gmail.com>.
The tserver.readahead.concurrent.max property provides an upper-bound on
the number of scans that will start "reading ahead". This read-ahead is
a performance tweak that tries to smooth the I/O cost of reading from
files. However, each readahead thread does increase the amount of heap
used as the data that was read is stored in memory. This parameter lets
you provide a maximum amount of space that will be used by readahead
across *all* scan tasks (from a Scanner, BatchScanner or even major
compactions) for a tablet server.
The tserver.scan.files.open.max property provides you with control over
the upper-bound of the number of files for scanning that a tablet server
(across all tablets hosted by that tablet server) can open. Again, as
holding these files open, this parameter is meant to allow you to place
an upper bound on the memory consumption used by opening files.
Now, the number of threads that a batchscanner uses is what's primarily
going to control your "server side parallelism". When you provide a
value of N to the batchscanner "threads", you will get up to N "scan
tasks" running concurrently against your Accumulo instance. The two
previously described properties will only act to limit the number of
resources that your single batchscanner (in the view of all active
batchscanners) can consume.
In situations with multiple clients reading from an Accumulo instance,
you may run into cases where a scan task (one thread from your
BatchScanner) is blocked until the tabletserver finishes a previous read
and thus frees additional resources (number of open files or readahead
threads) to satisfy your scan request.
Hope that helps.
On 2/7/14, 3:19 PM, Anthony F wrote:
> How do the config variables tserver.readahead.concurrent.max and
> tserver.scan.files.open.max interact with BatchScanner threads requested
> from the Connector? I have tserver.readahead.concurrent.max set to 64
> and tserver.scan.files.open.max set to 100. However, unless I bump up
> the number of BatchScanner threads, I don't see much tserver side
> parallelism. If I bump up the number of BatchScanner threads, then I
> can see multiple scans per tserver. What governs the number of tserver
> side threads used to execute a scan and what prevents too many threads
> from spinning up to service multiple concurrent scans from independent
> clients?
>