You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by John Pyeatt <jo...@singlewire.com> on 2013/11/20 13:53:48 UTC

nodetool repair seems to increase linearly with number of keyspaces

We have an application that has been designed to use potentially 100s of
keyspaces (one for each company).

One thing we are noticing is that nodetool repair across all of the
keyspaces seems to increase linearly based on the number of keyspaces. For
example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
hours even with no data in any of the keyspaces. If I bump that up to 40
keyspaces it takes 6 hours.

Is this the behaviour you would expect?

Is there anything you can think of (short of redesigning the cluster to
limit keyspaces) to increase the performance of the nodetool repairs?

My obvious concern is that as this application grows and we get more
companies using our it we will eventually have too many keyspaces to
perform repairs on the cluster.

-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
------------------
608.661.1184
john.pyeatt@singlewire.com

Re: nodetool repair seems to increase linearly with number of keyspaces

Posted by "Christopher J. Bottaro" <cj...@academicworks.com>.

We only have a single CF per keyspace.  Actually we have 2, but one is tiny
(only has 2 rows in it and is queried once a month or less).

Yup, using vnodes with 256 tokens.

Cassandra 1.2.10.

-- C


On Mon, Nov 25, 2013 at 2:28 PM, John Pyeatt <jo...@singlewire.com>wrote:

> Mr. Bottaro,
>
> About how many column families are in your keyspaces? We have 28 per
> keyspace.
>
> Are you using Vnodes? We are and they are set to 256
>
> What version of cassandra are you running. We are running 1.2.9
>
>
> On Mon, Nov 25, 2013 at 11:36 AM, Christopher J. Bottaro <
> cjbottaro@academicworks.com> wrote:
>
>> We have the same setup:  one keyspace per client, and currently about 300
>> keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
>> node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
>> we haven't been keeping track of the running time as keyspaces, or load,
>> increases.
>>
>> -- C
>>
>>
>> On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt <jo...@singlewire.com>wrote:
>>
>>> We have an application that has been designed to use potentially 100s of
>>> keyspaces (one for each company).
>>>
>>> One thing we are noticing is that nodetool repair across all of the
>>> keyspaces seems to increase linearly based on the number of keyspaces. For
>>> example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
>>> Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
>>> hours even with no data in any of the keyspaces. If I bump that up to 40
>>> keyspaces it takes 6 hours.
>>>
>>> Is this the behaviour you would expect?
>>>
>>> Is there anything you can think of (short of redesigning the cluster to
>>> limit keyspaces) to increase the performance of the nodetool repairs?
>>>
>>> My obvious concern is that as this application grows and we get more
>>> companies using our it we will eventually have too many keyspaces to
>>> perform repairs on the cluster.
>>>
>>> --
>>> John Pyeatt
>>> Singlewire Software, LLC
>>> www.singlewire.com
>>> ------------------
>>> 608.661.1184
>>> john.pyeatt@singlewire.com
>>>
>>
>>
>
>
> --
> John Pyeatt
> Singlewire Software, LLC
> www.singlewire.com
> ------------------
> 608.661.1184
> john.pyeatt@singlewire.com
>

Re: nodetool repair seems to increase linearly with number of keyspaces

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Nov 25, 2013 at 12:28 PM, John Pyeatt <jo...@singlewire.com>wrote:

> Are you using Vnodes? We are and they are set to 256
> What version of cassandra are you running. We are running 1.2.9
>

Vnode performance vis a vis repair is this JIRA issue :

https://issues.apache.org/jira/browse/CASSANDRA-5220

Unfortunately, in Cassandra 2.0 repair has also been changed to be serial
per replica in the replica set by default, which is (unless I've
misunderstood something...) likely to make it even slower in a direct
relationship to RF.

https://issues.apache.org/jira/browse/CASSANDRA-5950

This will probably necessitate re-visiting of this :

https://issues.apache.org/jira/browse/CASSANDRA-5850

=Rob

Re: nodetool repair seems to increase linearly with number of keyspaces

Posted by John Pyeatt <jo...@singlewire.com>.

Mr. Bottaro,

About how many column families are in your keyspaces? We have 28 per
keyspace.

Are you using Vnodes? We are and they are set to 256

What version of cassandra are you running. We are running 1.2.9


On Mon, Nov 25, 2013 at 11:36 AM, Christopher J. Bottaro <
cjbottaro@academicworks.com> wrote:

> We have the same setup:  one keyspace per client, and currently about 300
> keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
> node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
> we haven't been keeping track of the running time as keyspaces, or load,
> increases.
>
> -- C
>
>
> On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt <jo...@singlewire.com>wrote:
>
>> We have an application that has been designed to use potentially 100s of
>> keyspaces (one for each company).
>>
>> One thing we are noticing is that nodetool repair across all of the
>> keyspaces seems to increase linearly based on the number of keyspaces. For
>> example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
>> Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
>> hours even with no data in any of the keyspaces. If I bump that up to 40
>> keyspaces it takes 6 hours.
>>
>> Is this the behaviour you would expect?
>>
>> Is there anything you can think of (short of redesigning the cluster to
>> limit keyspaces) to increase the performance of the nodetool repairs?
>>
>> My obvious concern is that as this application grows and we get more
>> companies using our it we will eventually have too many keyspaces to
>> perform repairs on the cluster.
>>
>> --
>> John Pyeatt
>> Singlewire Software, LLC
>> www.singlewire.com
>> ------------------
>> 608.661.1184
>> john.pyeatt@singlewire.com
>>
>
>


-- 
John Pyeatt
Singlewire Software, LLC
www.singlewire.com
------------------
608.661.1184
john.pyeatt@singlewire.com

Re: nodetool repair seems to increase linearly with number of keyspaces

Posted by "Christopher J. Bottaro" <cj...@academicworks.com>.

We have the same setup:  one keyspace per client, and currently about 300
keyspaces.  nodetool repair takes a long time, 4 hours with -pr on a single
node.  We have a 4 node cluster with about 10 gb per node.  Unfortunately,
we haven't been keeping track of the running time as keyspaces, or load,
increases.

-- C


On Wed, Nov 20, 2013 at 6:53 AM, John Pyeatt <jo...@singlewire.com>wrote:

> We have an application that has been designed to use potentially 100s of
> keyspaces (one for each company).
>
> One thing we are noticing is that nodetool repair across all of the
> keyspaces seems to increase linearly based on the number of keyspaces. For
> example, if we have a 6 node ec2 (m1.large) cluster across 3 Availability
> Zones and create 20 keyspaces a nodetool repair -pr on one node takes 3
> hours even with no data in any of the keyspaces. If I bump that up to 40
> keyspaces it takes 6 hours.
>
> Is this the behaviour you would expect?
>
> Is there anything you can think of (short of redesigning the cluster to
> limit keyspaces) to increase the performance of the nodetool repairs?
>
> My obvious concern is that as this application grows and we get more
> companies using our it we will eventually have too many keyspaces to
> perform repairs on the cluster.
>
> --
> John Pyeatt
> Singlewire Software, LLC
> www.singlewire.com
> ------------------
> 608.661.1184
> john.pyeatt@singlewire.com
>