You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jimmy Lin <y2...@gmail.com> on 2014/10/30 04:41:42 UTC

tuning concurrent_reads param

Hi,
looking at the docs, the default value for concurrent_reads is 32, which
seems bit small to me (comparing to say http server)? because if my node is
receiving slight traffic, any more than 32 concurrent read query will have
to wait.(?)

Recommend rule is, 16* number of drives. Would that be different if I have
SSDs?

I am attempting to increase it because I have a few tables have wide rows
that app will fetch them, the pure size of data may already eating up the
thread time, which can cause  other read threads need to wait and essential
slow.

thanks

Re: tuning concurrent_reads param

Posted by Jimmy Lin <y2...@gmail.com>.

I see, thanks for explaining what that means.

If we are using SSD, then reordering/merging has less impact than
traditional mechanical hard disk, so using SSD drive probably can deal
with increased  concurrent_read better. (?)

Re: tuning concurrent_reads param

Posted by Bryan Talbot <br...@playnext.com>.

On Wed, Nov 5, 2014 at 11:00 PM, Jimmy Lin <y2...@gmail.com> wrote:

> Sorry I have late follow up question ....
>
> In the Cassandra.yaml file the concurrent_read section has the following
> comment:
>
> What does it mean by " the operations to enqueue low enough in the stack
> that the OS and drives can reorder them." ? how does it help making the
> system healthy?
>

The operating system, disk controllers, and disks themselves can merge and
reorder requests to optimize performance.

Here's a relevant page with some details if you're interested in more
http://www.makelinux.net/books/lkd2/ch13lev1sec5



> What really happen if we increase it to a too high value? (maybe affecting
> other read or write operation as it eat up all disk IO resource?)
>


Yes

-Bryan

Re: tuning concurrent_reads param

Posted by Jimmy Lin <y2...@gmail.com>.

Sorry I have late follow up question ....

In the Cassandra.yaml file the concurrent_read section has the following
comment:

What does it mean by " the operations to enqueue low enough in the stack
that the OS and drives can reorder them." ? how does it help making the
system healthy?
What really happen if we increase it to a too high value? (maybe affecting
other read or write operation as it eat up all disk IO resource?)

thanks

# For workloads with more daa than can fit in memory, Cassandra's
# bottleneck will be reads that need to fetch data from
# disk. "concurrent_reads" shuld be set to (16 * number_of_drives) in
# order to allow the operations to enqueue low enough in the stack
# that the OS and drives can reorder them.

On Wed, Oct 29, 2014 at 8:47 PM, Chris Lohfink <ch...@datastax.com>
wrote:

> Theres a bit to it, sometimes it can use tweaking though.  Its a good
> default for most systems so I wouldn't increase right off the bat. When
> using ssds or something with a lot of horsepower it could be higher though
> (ie i2.xlarge+ on ec2).  If you monitor the number of active threads in
> read thread pool (nodetool tpstats) you can see if they are actually all
> busy or not.  If its near 32 (or whatever you set it at) all the time it
> may be a bottleneck.
>
> ---
> Chris Lohfink
>
> On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin <y2...@gmail.com> wrote:
>
>> Hi,
>> looking at the docs, the default value for concurrent_reads is 32, which
>> seems bit small to me (comparing to say http server)? because if my node is
>> receiving slight traffic, any more than 32 concurrent read query will have
>> to wait.(?)
>>
>> Recommend rule is, 16* number of drives. Would that be different if I
>> have SSDs?
>>
>> I am attempting to increase it because I have a few tables have wide rows
>> that app will fetch them, the pure size of data may already eating up the
>> thread time, which can cause  other read threads need to wait and essential
>> slow.
>>
>> thanks
>>
>>
>>
>>
>

Re: tuning concurrent_reads param

Posted by Chris Lohfink <ch...@datastax.com>.

Theres a bit to it, sometimes it can use tweaking though.  Its a good
default for most systems so I wouldn't increase right off the bat. When
using ssds or something with a lot of horsepower it could be higher though
(ie i2.xlarge+ on ec2).  If you monitor the number of active threads in
read thread pool (nodetool tpstats) you can see if they are actually all
busy or not.  If its near 32 (or whatever you set it at) all the time it
may be a bottleneck.

---
Chris Lohfink

On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin <y2...@gmail.com> wrote:

> Hi,
> looking at the docs, the default value for concurrent_reads is 32, which
> seems bit small to me (comparing to say http server)? because if my node is
> receiving slight traffic, any more than 32 concurrent read query will have
> to wait.(?)
>
> Recommend rule is, 16* number of drives. Would that be different if I have
> SSDs?
>
> I am attempting to increase it because I have a few tables have wide rows
> that app will fetch them, the pure size of data may already eating up the
> thread time, which can cause  other read threads need to wait and essential
> slow.
>
> thanks
>
>
>
>