You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2014/11/04 21:51:34 UTC

[jira] [Comment Edited] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196749#comment-14196749 ] 

Ariel Weisberg edited comment on CASSANDRA-6976 at 11/4/14 8:50 PM:
--------------------------------------------------------------------

I tried reproducing this on my i5 Sandy Bridge with 4gb of RAM using CCM. I modified StorageProxy.getRangeSlice to log time in microseconds and removed the code from CASSANDRA-6906 that short circuits away from StorageProxy.getRestrictedRanges. I also timed getRestrictedRanges and found the loop inside dominated.

I set MAX_NEW_SIZE to 32m and MAX_HEAP_SIZE to 256m.

ccm create -n 3 --vnodes foobar
Then edit ~/.ccm/cluster.conf to set num_tokens to 1536
ccm updateconfg
ccm start

ccm node1 showlog says the config using 1536 tokens. I can't set num tokens higher. 

bq. printf 'select * from system.local;%.0s' {1..50000} | ./cqlsh > /dev/null
My benchmark client

The loop in getRestrictedRanges iterated 4609 times in 450 microseconds. StorageProxy.getRangeSlice took 6 milliseconds. With num_tokens = 256 the number of iterations was 769, time was 80 microseconds, the total for getRangeSlice was 1300 microseconds.

Performance cuts in half twice, once after the first handful of queries and again after several thousand which is probably JIT kicking in.

So yeah, total work done looks linear to number of tokens. Not clear yet if this is simply because the query I am using actually needs to hit every single vnode and so the penalty is payed for each one and is thus large or if this penalty would be payed by every single query regardless of expected level of work.

edit
Adding pastebin of some measurements from IRC https://www.irccloud.com/pastebin/ojwzOEr1


was (Author: aweisberg):
I tried reproducing this on my i5 Sandy Bridge with 4gb of RAM using CCM. I modified StorageProxy.getRangeSlice to log time in microseconds and removed the code from CASSANDRA-6906 that short circuits away from StorageProxy.getRestrictedRanges. I also timed getRestrictedRanges and found the loop inside dominated.

I set MAX_NEW_SIZE to 32m and MAX_HEAP_SIZE to 256m.

ccm create -n 3 --vnodes foobar
Then edit ~/.ccm/cluster.conf to set num_tokens to 1536
ccm updateconfg
ccm start

ccm node1 showlog says the config using 1536 tokens. I can't set num tokens higher. 

bq. printf 'select * from system.local;%.0s' {1..50000} | ./cqlsh > /dev/null
My benchmark client

The loop in getRestrictedRanges iterated 4609 times in 450 microseconds. StorageProxy.getRangeSlice took 6 milliseconds. With num_tokens = 256 the number of iterations was 769, time was 80 microseconds, the total for getRangeSlice was 1300 microseconds.

Performance cuts in half twice, once after the first handful of queries and again after several thousand which is probably JIT kicking in.

So yeah, total work done looks linear to number of tokens. Not clear yet if this is simply because the query I am using actually needs to hit every single vnode and so the penalty is payed for each one and is thus large or if this penalty would be payed by every s

> Determining replicas to query is very slow with large numbers of nodes or vnodes
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6976
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6976
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 2.1.2
>
>
> As described in CASSANDRA-6906, this can be ~100ms for a relatively small cluster with vnodes, which is longer than it will spend in transit on the network. This should be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)