You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/09/08 22:16:08 UTC
[jira] [Updated] (CASSANDRA-2890) Randomize (to some extend) the
choice of the first replica for counter increment
[ https://issues.apache.org/jira/browse/CASSANDRA-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-2890:
--------------------------------------
Fix Version/s: 1.1
David Hawthorne reports on the mailing list that he ran into this in the wild:
{quote}
It was exactly due to 2890, and the fact that the first replica is always the one with the lowest value IP address. I patched cassandra to pick a random node out of the replica set in StorageProxy.java findSuitableEndpoint:
Random rng = new Random();
return endpoints.get(rng.nextInt(endpoints.size())); // instead of return endpoints.get(0);
Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the inserts/sec throughput.
Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite load of a counter insert:
One node will get RF/n of the disk work. Two nodes will always get 0 disk work.
in a 3 node cluster, 1 node gets disk hit really hard. You get the performance of a one-node cluster.
in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you the performance of ~2 node cluster.
in a 10 node cluster, 1 node gets 30% of the disk work, giving you the performance of a ~3 node cluster.
I confirmed this behavior with a 3, 4, and 5 node cluster size.
{quote}
> Randomize (to some extend) the choice of the first replica for counter increment
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-2890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2890
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.8.0
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Labels: counters
> Fix For: 1.1
>
>
> Right now, we choose the first replica for a counter increments based solely on what the snitch returns. If the clients requests are well balanced over the cluster and the snitch not ill configured, this should not be a problem, but this is probably too strong an assumption to make.
> The goal of this ticket is to change this to choose a random replica in the current data center instead.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira