You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vegard Berget <po...@fantasista.no> on 2013/06/10 15:04:38 UTC

Changing replication factor

Hi,
If one increases the replication factor of a keyspace and then do a
repair, how will this affect the performance of the affected nodes?
Could we risk the nodes being (more or less) unresponsive while repair
is going on?  The nodes I am speaking of contains ~100gb of data.
 Also, some of the keyspaces I am considering increase the
replication factor for contains Counter Column Families (has rf:1).
 I think I have read that adding replication to counter cfs will
affect performance negatively, is this correct?  
Cassandra version is 1.1.7.
.vegard,

Re: Changing replication factor

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Jun 17, 2013 at 5:33 AM, Vegard  Berget <po...@fantasista.no> wrote:
> "invalid counter shard detected; (X, Y, Z) and (X, Y, Z2) differ only in
> count; will pick highest to self-heal; this indicates a bug or corruption
> generated a bad counter shard"

https://issues.apache.org/jira/browse/CASSANDRA-4417
and
https://issues.apache.org/jira/browse/CASSANDRA-4071

tl;dr - nobody fully understands in what case they are created (though
there are a few likely candidates) or how to fix them without
potential loss of counter accuracy, and nobody seems to be working on
a solution at this time.

https://issues.apache.org/jira/browse/CASSANDRA-5026

Reduces the frequency of the log messages.

https://issues.apache.org/jira/browse/CASSANDRA-4775

Is the ticket where Counters 2.0 design is occurring.

=Rob

Re: Changing replication factor

Posted by Vegard Berget <po...@fantasista.no>.
Hi,
Thank you for the information.I have increased the rf, and I think the
increase we have seen in cpu load etc is due to the counter cf's,
which is almost write-only (reads a few times a day).  The load
increase is noticeable, but no problem.Repair went fine.  But I
noticed that when I increased rf for a counter column and for (some
completely different reasons) took one node down, and after that ran
Repair I would get multiple lines in system.log:"invalid counter shard
detected; (X, Y, Z) and (X, Y, Z2) differ only in count; will pick
highest to self-heal; this indicates a bug or corruption generated a
bad counter shard"I guess this is because that while the node was
down, the counters gets out of sync and needs to just pick the
highest?  In my case this will be (more or less) correct, since the
sync-problem happened because of a downed node,which means _all_
increases happens on the other node and that node will have the
correct number?  I am just curious, as some minor errors in the
counters would be no problem for us.
.vegard,
----- Original Message -----
From: user@cassandra.apache.org
To:, "Vegard Berget" 
Cc:
Sent:Fri, 14 Jun 2013 17:20:26 -0700
Subject:Re: Changing replication factor

 On Mon, Jun 10, 2013 at 6:04 AM, Vegard Berget  wrote:
 > If one increases the replication factor of a keyspace and then do a
repair,
 > how will this affect the performance of the affected nodes? Could
we risk
 > the nodes being (more or less) unresponsive while repair is going
on?

 Repair is a relatively heavyweight activity (the heaviest a cassandra
 node can do!) which requires significant headroom in terms of CPU,
 heap memory and disk space. It is possible that nodes could become
 unavailable transiently during the repair, but unless they are
already
 very busy they should not become completely unresponsive. For one
 thing, both compaction and streaming respect throttles which are
 designed to minimize the impact of the streaming/compaction workload
 resulting from repair.

 > The nodes I am speaking of contains ~100gb of data.

 This is a relatively small amount of data per node, which makes the
 impact of Repair less severe.

 > Also, some of the keyspaces I am considering increase the
replication factor
 > for contains Counter Column Families (has rf:1). I think I have
read that
 > adding replication to counter cfs will affect performance
negatively, is
 > this correct?

 Per Sylvain (one of the primary authors of the Counters codebase) [1]
:

 "
 For counters, it's a little bit different. At RF=3, for each inserts,
 one node is doing a write *and* a read, while the two other nodes are
 only doing a
 write. So given that the read takes a time is non negligible, you
 should see simple
 improvement a RF=3 compared to RF=1 because each node gets 1/3 of the
 reads (involved in
 the counter write) it would get if it was the only replica. Now if
the
 write time
 were negligible compared to the read time, then yes you would see
roughly a 3x
 increase. But while writes are still faster than reads in Cassandra,
 reads a now fairly
 fast too (but all this depends on other factor like how much the
 caches helps, etc...), so it
 will likely be less than a 3x increase. Should be noticeable though."
 "

 I interpret the above to mean that RF=3 is actually slightly *faster*
 for Counters than RF=1.

 =Rob

 [1]
http://mail-archives.apache.org/mod_mbox/cassandra-user/201110.mbox/%3CCAKkz8Q0ThzzSBu2370MX6jPeEC3Lh17Pjmv1koJGgAuaJupCtQ@mail.gmail.com%3E


Re: Changing replication factor

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Jun 10, 2013 at 6:04 AM, Vegard  Berget <po...@fantasista.no> wrote:
> If one increases the replication factor of a keyspace and then do a repair,
> how will this affect the performance of the affected nodes? Could we risk
> the nodes being (more or less) unresponsive while repair is going on?

Repair is a relatively heavyweight activity (the heaviest a cassandra
node can do!) which requires significant headroom in terms of CPU,
heap memory and disk space. It is possible that nodes could become
unavailable transiently during the repair, but unless they are already
very busy they should not become completely unresponsive. For one
thing, both compaction and streaming respect throttles which are
designed to minimize the impact of the streaming/compaction workload
resulting from repair.

>  The nodes I am speaking of contains ~100gb of data.

This is a relatively small amount of data per node, which makes the
impact of Repair less severe.

> Also, some of the keyspaces I am considering increase the replication factor
> for contains Counter Column Families (has rf:1).  I think I have read that
> adding replication to counter cfs will affect performance negatively, is
> this correct?

Per Sylvain (one of the primary authors of the Counters codebase)  [1] :

"
For counters, it's a little bit different. At RF=3, for each inserts,
one node is doing a write *and* a read, while the two other nodes are
only doing a
write. So given that the read takes a time is non negligible, you
should see simple
improvement a RF=3 compared to RF=1 because each node gets 1/3 of the
reads (involved in
the counter write) it would get if it was the only replica. Now if the
write time
were negligible compared to the read time, then yes you would see roughly a 3x
increase. But while writes are still faster than reads in Cassandra,
reads a now fairly
fast too (but all this depends on other factor like how much the
caches helps, etc...), so it
will likely be less than a 3x increase. Should be noticeable though."
"

I interpret the above to mean that RF=3 is actually slightly *faster*
for Counters than RF=1.

=Rob

[1] http://mail-archives.apache.org/mod_mbox/cassandra-user/201110.mbox/%3CCAKkz8Q0ThzzSBu2370MX6jPeEC3Lh17Pjmv1koJGgAuaJupCtQ@mail.gmail.com%3E