You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "ZAIDI, ASAD A" <az...@att.com> on 2017/07/11 14:24:43 UTC

reduced num_token = improved performance ??

Hi Folks,

Pardon me if I’m missing  something obvious.  I’m still using apache-cassandra 2.2 and planning for upgrade to  3.x.
I came across this jira [https://issues.apache.org/jira/browse/CASSANDRA-7032] that suggests reducing num_token may improve general performance of Cassandra like having  num_token=16 instead of 256   may help!

Can you please suggests if having less num_token would provide real performance benefits or if  it comes with any downsides that we should also consider? I’ll much appreciate your insights.

Thank you
Asad

Re: reduced num_token = improved performance ??

Posted by Chris Lohfink <cl...@gmail.com>.
Probably worth mentioning that some operational procedures like repairs,
bootstrapping etc are helped massively by using less tokens. Incremental
repairs are one of the things I would say is most impacted the by it since
less tokens will mean less local ranges to iterate through and less anti
compaction. I would highly recommend using far less than 256 in 3.x.

Chris

On Tue, Jul 11, 2017 at 8:36 PM, Justin Cameron <ju...@instaclustr.com>
wrote:

> Hi,
>
> Using fewer vnodes means you'll have a higher chance of hot spots in your
> cluster. Hot spots in Cassandra are nodes that, by random chance, are
> responsible for a higher percentage of the token space than others. This
> means they will receive more data and also more traffic/load than other
> nodes in the cluster.
>
> CASSANDRA-7032 goes a long way towards addresses this issue by allocating
> vnode tokens more intelligently, rather than just randomly assigning them.
> If you're using a version of Cassandra that contains this feature (3.0+),
> you can use a smaller number of vnodes in your cluster.
>
> A high number of vnodes won't affect performance for most Cassandra
> workloads, but if you're running tasks that need to do token-range scans
> (such as Spark), there is usually a significant performance hit.
>
> If you're on C* 3.0+ and are using Spark (or similar workloads - cassandra
> lucene index plugin is also affected) then I'd recommend using fewer vnodes
> - 16 would be ok. You'll probably still see some variance in token-space
> ownership between nodes, but the trade-off for better Spark performance
> will likely be worth it.
>
> Justin
>
> On Wed, 12 Jul 2017 at 00:34 ZAIDI, ASAD A <az...@att.com> wrote:
>
>> Hi Folks,
>>
>>
>>
>> Pardon me if I’m missing  something obvious.  I’m still using
>> apache-cassandra 2.2 and planning for upgrade to  3.x.
>>
>> I came across this jira [https://issues.apache.org/
>> jira/browse/CASSANDRA-7032] that suggests reducing num_token may improve
>> general performance of Cassandra like having  num_token=16 instead of 256
>>   may help!
>>
>>
>>
>> Can you please suggests if having less num_token would provide real
>> performance benefits or if  it comes with any downsides that we should also
>> consider? I’ll much appreciate your insights.
>>
>>
>>
>> Thank you
>>
>> Asad
>>
> --
>
>
> *Justin Cameron*Senior Software Engineer
>
>
> <https://www.instaclustr.com/>
>
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>

Re: reduced num_token = improved performance ??

Posted by Justin Cameron <ju...@instaclustr.com>.
Hi,

Using fewer vnodes means you'll have a higher chance of hot spots in your
cluster. Hot spots in Cassandra are nodes that, by random chance, are
responsible for a higher percentage of the token space than others. This
means they will receive more data and also more traffic/load than other
nodes in the cluster.

CASSANDRA-7032 goes a long way towards addresses this issue by allocating
vnode tokens more intelligently, rather than just randomly assigning them.
If you're using a version of Cassandra that contains this feature (3.0+),
you can use a smaller number of vnodes in your cluster.

A high number of vnodes won't affect performance for most Cassandra
workloads, but if you're running tasks that need to do token-range scans
(such as Spark), there is usually a significant performance hit.

If you're on C* 3.0+ and are using Spark (or similar workloads - cassandra
lucene index plugin is also affected) then I'd recommend using fewer vnodes
- 16 would be ok. You'll probably still see some variance in token-space
ownership between nodes, but the trade-off for better Spark performance
will likely be worth it.

Justin

On Wed, 12 Jul 2017 at 00:34 ZAIDI, ASAD A <az...@att.com> wrote:

> Hi Folks,
>
>
>
> Pardon me if I’m missing  something obvious.  I’m still using
> apache-cassandra 2.2 and planning for upgrade to  3.x.
>
> I came across this jira [
> https://issues.apache.org/jira/browse/CASSANDRA-7032] that suggests
> reducing num_token may improve general performance of Cassandra like
> having  num_token=16 instead of 256   may help!
>
>
>
> Can you please suggests if having less num_token would provide real
> performance benefits or if  it comes with any downsides that we should also
> consider? I’ll much appreciate your insights.
>
>
>
> Thank you
>
> Asad
>
-- 


*Justin Cameron*Senior Software Engineer


<https://www.instaclustr.com/>


This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.