You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/09/08 22:16:08 UTC

[jira] [Updated] (CASSANDRA-2890) Randomize (to some extend) the choice of the first replica for counter increment

     [ https://issues.apache.org/jira/browse/CASSANDRA-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2890:
--------------------------------------

    Fix Version/s: 1.1

David Hawthorne reports on the mailing list that he ran into this in the wild:

{quote}
It was exactly due to 2890, and the fact that the first replica is always the one with the lowest value IP address.  I patched cassandra to pick a random node out of the replica set in StorageProxy.java findSuitableEndpoint:

Random rng = new Random();

return endpoints.get(rng.nextInt(endpoints.size()));  // instead of return endpoints.get(0);

Now work load is evenly balanced among all 5 nodes and I'm getting 2.5x the inserts/sec throughput.

Here's the behavior I saw, and "disk work" refers to the ReplicateOnWrite load of a counter insert:

One node will get RF/n of the disk work.  Two nodes will always get 0 disk work.

in a 3 node cluster, 1 node gets disk hit really hard.  You get the performance of a one-node cluster.
in a 6 node cluster, 1 node gets hit with 50% of the disk work, giving you the performance of ~2 node cluster.
in a 10 node cluster, 1 node gets 30% of the disk work, giving you the performance of a ~3 node cluster.

I confirmed this behavior with a 3, 4, and 5 node cluster size.
{quote}

> Randomize (to some extend) the choice of the first replica for counter increment
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2890
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2890
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>              Labels: counters
>             Fix For: 1.1
>
>
> Right now, we choose the first replica for a counter increments based solely on what the snitch returns. If the clients requests are well balanced over the cluster and the snitch not ill configured, this should not be a problem, but this is probably too strong an assumption to make.
> The goal of this ticket is to change this to choose a random replica in the current data center instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira