You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vsevolod Filaretov <vs...@gmail.com> on 2018/06/15 11:12:53 UTC

Good time of day everyone,

I've got three questions on Cassandra paging mechanics and cluster usage
regulation.

1) Am I correct to assume that the larger page size some user session has
set - the larger portion of cluster/coordinator node resources will be
hogged by the corresponding session?

2) Do I understand correctly that page size (imagine we have no timeout
settings) is limited by RAM and iops which I want to hand down to a single
user session?

3) Am I correct to assume that the page size/read request timeout allowance
I set is direct representation of chance to lock some node to single user's
requests?


Best regards,

Vsevolod.

Re:

Posted by Jeff Jirsa <jj...@gmail.com>.
Just assume that the rows you read in a page all end up in the heap at the same time

If you’re reading 1000 rows of 100 bytes, no big deal, you’ve got 100kb per read thread on the heap

If you’re reading 100 1mb rows, now you’ve got 100MB per thread on the heap

Assuming an 8gb heap with 2gb young gen size, the first example is probably no problem even with dozens of concurrent reads, but the second will trigger a young gc every 10-15 reads (could be promotion, depending on how many concurrent reads you’re doing). 




-- 
Jeff Jirsa


> On Jun 19, 2018, at 1:53 AM, Vsevolod Filaretov <vs...@gmail.com> wrote:
> 
> Kurt, thank you very much for your answer! Your remark on GC totally changed my thoughts on cassandra resources usage.
> 
> So.. more questions to the respective audience underway.
> 
> What is generally considered as 
> 
> 1) "too large" page size, 
> 2)"large" page size
> 3) "normal conditions" page size?
> 
> How exactly fetch size affects CPU? Can too large page size provoke severe CPU usage for constant GC, thus affecting Cassandra performance on read requests (because CPU basically doesn't work on other tasks, while it's constantly GCing)?
> 
> Thank you all very much!
> 
> пн, 18 июн. 2018 г., 14:28 kurt greaves <ku...@instaclustr.com>:
>>> 1) Am I correct to assume that the larger page size some user session has set - the larger portion of cluster/coordinator node resources will be hogged by the corresponding session?
>>> 2) Do I understand correctly that page size (imagine we have no timeout settings) is limited by RAM and iops which I want to hand down to a single user session?
>> Yes for both of the above. More rows will be pulled into memory simultaneously with a larger page size, thus using more memory and IO. 
>> 
>>> 3) Am I correct to assume that the page size/read request timeout allowance I set is direct representation of chance to lock some node to single user's requests?
>> Concurrent reads can occur on a node, so it shouldn't "lock" the node to a single users request. However you can overload the node, which may be effectively the same thing. Don't set page sizes too high, otherwise the coordinator of the query will end up doing a lot of GC. 
>> 
>> 

Re:

Posted by Vsevolod Filaretov <vs...@gmail.com>.
Kurt, thank you very much for your answer! Your remark on GC totally
changed my thoughts on cassandra resources usage.

So.. more questions to the respective audience underway.

What is generally considered as

1) "too large" page size,
2)"large" page size
3) "normal conditions" page size?

How exactly fetch size affects CPU? Can too large page size provoke severe
CPU usage for constant GC, thus affecting Cassandra performance on read
requests (because CPU basically doesn't work on other tasks, while it's
constantly GCing)?

Thank you all very much!

пн, 18 июн. 2018 г., 14:28 kurt greaves <ku...@instaclustr.com>:

> 1) Am I correct to assume that the larger page size some user session has
>> set - the larger portion of cluster/coordinator node resources will be
>> hogged by the corresponding session?
>> 2) Do I understand correctly that page size (imagine we have no timeout
>> settings) is limited by RAM and iops which I want to hand down to a single
>> user session?
>
> Yes for both of the above. More rows will be pulled into memory
> simultaneously with a larger page size, thus using more memory and IO.
>
> 3) Am I correct to assume that the page size/read request timeout
>> allowance I set is direct representation of chance to lock some node to
>> single user's requests?
>
> Concurrent reads can occur on a node, so it shouldn't "lock" the node to a
> single users request. However you can overload the node, which may be
> effectively the same thing. Don't set page sizes too high, otherwise the
> coordinator of the query will end up doing a lot of GC.
>
>
>

Re:

Posted by kurt greaves <ku...@instaclustr.com>.
>
> 1) Am I correct to assume that the larger page size some user session has
> set - the larger portion of cluster/coordinator node resources will be
> hogged by the corresponding session?
> 2) Do I understand correctly that page size (imagine we have no timeout
> settings) is limited by RAM and iops which I want to hand down to a single
> user session?

Yes for both of the above. More rows will be pulled into memory
simultaneously with a larger page size, thus using more memory and IO.

3) Am I correct to assume that the page size/read request timeout allowance
> I set is direct representation of chance to lock some node to single user's
> requests?

Concurrent reads can occur on a node, so it shouldn't "lock" the node to a
single users request. However you can overload the node, which may be
effectively the same thing. Don't set page sizes too high, otherwise the
coordinator of the query will end up doing a lot of GC.