You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Viswanathan Ramachandran <vi...@gmail.com> on 2014/08/12 23:06:46 UTC

Nodetool Repair questions

Some questions on nodetool repair.

1. This tool repairs inconsistencies across replicas of the row. Since
latest update always wins, I dont see inconsistencies other than ones
resulting from the combination of deletes, tombstones, and crashed nodes.
Technically, if data is never deleted from cassandra, then nodetool repair
does not need to be run at all. Is this understanding correct? If wrong,
can anyone provide other ways inconsistencies could occur?

2. Want to understand the performance of 'nodetool repair' in a Cassandra
multi data center setup. As we add nodes to the cluster in various data
centers, does the performance of nodetool repair on each node increase
linearly, or is it quadratic ? The essence of this question is: If I have a
keyspace with x number of replicas in each data center, do I have to deal
with an upper limit on the number of data centers/nodes?


Thanks

Vish

Re: Nodetool Repair questions

Posted by Viswanathan Ramachandran <vi...@gmail.com>.
Thanks Mark,
Since we have replicas in each data center, addition of a new data center
(and new replicas) has a performance implication on nodetool repair.
I do understand that adding nodes without increasing number of replicas may
improve repair performance, but in this case we are adding new data center
and additional replicas which is an added overhead on nodetool repair.
Hence the thinking that we may reach an upper limit which could be the
point when the nodetool repair costs are way too high.


On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy <ma...@boxever.com> wrote:

> Hi Vish,
>
> 1. This tool repairs inconsistencies across replicas of the row. Since
>> latest update always wins, I dont see inconsistencies other than ones
>> resulting from the combination of deletes, tombstones, and crashed nodes.
>> Technically, if data is never deleted from cassandra, then nodetool repair
>> does not need to be run at all. Is this understanding correct? If wrong,
>> can anyone provide other ways inconsistencies could occur?
>>
>
> Even if you never delete data you should run repairs occasionally to
> ensure overall consistency. While hinted handoffs and read repairs do lead
> to better consistency, they are only helpers/optimization and are not
> regarded as operations that ensure consistency.
>
> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
>> multi data center setup. As we add nodes to the cluster in various data
>> centers, does the performance of nodetool repair on each node increase
>> linearly, or is it quadratic ?
>>
>
> Its difficult to calculate the performance of a repair, I've seen the time
> to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
> theory adding more nodes would spread the data and free up machine
> resources, thus resulting in more performant repairs.
>
> The essence of this question is: If I have a keyspace with x number of
>> replicas in each data center, do I have to deal with an upper limit on the
>> number of data centers/nodes?
>
>
> Could you expand on why you believe there would be an upper limit of
> dc/nodes due to running repairs?
>
>
> Mark
>
>
> On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran <
> vish.ramachandran@gmail.com> wrote:
>
>>  Some questions on nodetool repair.
>>
>> 1. This tool repairs inconsistencies across replicas of the row. Since
>> latest update always wins, I dont see inconsistencies other than ones
>> resulting from the combination of deletes, tombstones, and crashed nodes.
>> Technically, if data is never deleted from cassandra, then nodetool repair
>> does not need to be run at all. Is this understanding correct? If wrong,
>> can anyone provide other ways inconsistencies could occur?
>>
>> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
>> multi data center setup. As we add nodes to the cluster in various data
>> centers, does the performance of nodetool repair on each node increase
>> linearly, or is it quadratic ? The essence of this question is: If I have a
>> keyspace with x number of replicas in each data center, do I have to deal
>> with an upper limit on the number of data centers/nodes?
>>
>>
>> Thanks
>>
>> Vish
>>
>
>

Re: Nodetool Repair questions

Posted by Mark Reddy <ma...@boxever.com>.
Hi Vish,

1. This tool repairs inconsistencies across replicas of the row. Since
> latest update always wins, I dont see inconsistencies other than ones
> resulting from the combination of deletes, tombstones, and crashed nodes.
> Technically, if data is never deleted from cassandra, then nodetool repair
> does not need to be run at all. Is this understanding correct? If wrong,
> can anyone provide other ways inconsistencies could occur?
>

Even if you never delete data you should run repairs occasionally to ensure
overall consistency. While hinted handoffs and read repairs do lead to
better consistency, they are only helpers/optimization and are not regarded
as operations that ensure consistency.

2. Want to understand the performance of 'nodetool repair' in a Cassandra
> multi data center setup. As we add nodes to the cluster in various data
> centers, does the performance of nodetool repair on each node increase
> linearly, or is it quadratic ?
>

Its difficult to calculate the performance of a repair, I've seen the time
to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
theory adding more nodes would spread the data and free up machine
resources, thus resulting in more performant repairs.

The essence of this question is: If I have a keyspace with x number of
> replicas in each data center, do I have to deal with an upper limit on the
> number of data centers/nodes?


Could you expand on why you believe there would be an upper limit of
dc/nodes due to running repairs?


Mark


On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran <
vish.ramachandran@gmail.com> wrote:

> Some questions on nodetool repair.
>
> 1. This tool repairs inconsistencies across replicas of the row. Since
> latest update always wins, I dont see inconsistencies other than ones
> resulting from the combination of deletes, tombstones, and crashed nodes.
> Technically, if data is never deleted from cassandra, then nodetool repair
> does not need to be run at all. Is this understanding correct? If wrong,
> can anyone provide other ways inconsistencies could occur?
>
> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
> multi data center setup. As we add nodes to the cluster in various data
> centers, does the performance of nodetool repair on each node increase
> linearly, or is it quadratic ? The essence of this question is: If I have a
> keyspace with x number of replicas in each data center, do I have to deal
> with an upper limit on the number of data centers/nodes?
>
>
> Thanks
>
> Vish
>

Re: Nodetool Repair questions

Posted by Andrey Ilinykh <ai...@gmail.com>.
On Tue, Aug 12, 2014 at 4:46 PM, Viswanathan Ramachandran <
vish.ramachandran@gmail.com> wrote:

> Andrey, QUORUM consistency and no deletes makes perfect sense.
> I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
> deletes - isnt that right?
>

 yes.

Re: Nodetool Repair questions

Posted by Viswanathan Ramachandran <vi...@gmail.com>.
Andrey, QUORUM consistency and no deletes makes perfect sense.
I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
deletes - isnt that right ?

Thanks


On Tue, Aug 12, 2014 at 3:10 PM, Andrey Ilinykh <ai...@gmail.com> wrote:

> 1. You don't have to repair if you use QUORUM consistency and you don't
> delete data.
> 2.Performance depends on size of data each node has. It's very difficult
> to predict. It may take days.
>
> Thank you,
>   Andrey
>
>
>
> On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran <
> vish.ramachandran@gmail.com> wrote:
>
>> Some questions on nodetool repair.
>>
>> 1. This tool repairs inconsistencies across replicas of the row. Since
>> latest update always wins, I dont see inconsistencies other than ones
>> resulting from the combination of deletes, tombstones, and crashed nodes.
>> Technically, if data is never deleted from cassandra, then nodetool repair
>> does not need to be run at all. Is this understanding correct? If wrong,
>> can anyone provide other ways inconsistencies could occur?
>>
>> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
>> multi data center setup. As we add nodes to the cluster in various data
>> centers, does the performance of nodetool repair on each node increase
>> linearly, or is it quadratic ? The essence of this question is: If I have a
>> keyspace with x number of replicas in each data center, do I have to deal
>> with an upper limit on the number of data centers/nodes?
>>
>>
>> Thanks
>>
>> Vish
>>
>
>

Re: Nodetool Repair questions

Posted by Andrey Ilinykh <ai...@gmail.com>.
1. You don't have to repair if you use QUORUM consistency and you don't
delete data.
2.Performance depends on size of data each node has. It's very difficult to
predict. It may take days.

Thank you,
  Andrey


On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran <
vish.ramachandran@gmail.com> wrote:

> Some questions on nodetool repair.
>
> 1. This tool repairs inconsistencies across replicas of the row. Since
> latest update always wins, I dont see inconsistencies other than ones
> resulting from the combination of deletes, tombstones, and crashed nodes.
> Technically, if data is never deleted from cassandra, then nodetool repair
> does not need to be run at all. Is this understanding correct? If wrong,
> can anyone provide other ways inconsistencies could occur?
>
> 2. Want to understand the performance of 'nodetool repair' in a Cassandra
> multi data center setup. As we add nodes to the cluster in various data
> centers, does the performance of nodetool repair on each node increase
> linearly, or is it quadratic ? The essence of this question is: If I have a
> keyspace with x number of replicas in each data center, do I have to deal
> with an upper limit on the number of data centers/nodes?
>
>
> Thanks
>
> Vish
>