You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Omid Aladini <om...@gmail.com> on 2012/10/16 11:33:22 UTC

Is Anti Entropy repair idempotent with respect to transferred data?

Hey!

I was wondering if streamed data via Anti Entropy repair is idempotent
with respect to fixed set of data and convergent with respect to
mutating set of data, meaning that:

- Given there are 0 mutations going on on the cluster and we run
repair multiple times, data would only be transferred the first time
(meaning that Merkle trees would be equal after applying repair once.)

- In case we have mutations on the cluster, and we run repair multiple
times, the amount of data transferred on each repair is proportional
to the size of lost messages since the last repair AND the fact that
different nodes take the snapshot at slightly different times (to
build Merkle tree on).

In my experience running repair on some counter data, the size of
streamed data is much bigger than the cluster could possibly have lost
messages or would be due to snapshotting at different times.

I know the data will eventually be in sync on every repair, but I'm
more interested in whether Cassandra transfers excess data and how to
minimize this.

Does any body have insights on this?

Thanks,
Omid

Re: Is Anti Entropy repair idempotent with respect to transferred data?

Posted by Omid Aladini <om...@gmail.com>.
Thanks Andrey. Also found this ticket regarding this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2698

On Tue, Oct 16, 2012 at 8:00 PM, Andrey Ilinykh <ai...@gmail.com> wrote:
>> In my experience running repair on some counter data, the size of
>> streamed data is much bigger than the cluster could possibly have lost
>> messages or would be due to snapshotting at different times.
>>
>> I know the data will eventually be in sync on every repair, but I'm
>> more interested in whether Cassandra transfers excess data and how to
>> minimize this.
>>
>> Does any body have insights on this?
>>
> The problem is in granularity of Merkle tree. Cassandra sends regions
> which have different hash values. It could be much bigger then a
> single row.
>
> Andrey

Re: Is Anti Entropy repair idempotent with respect to transferred data?

Posted by Andrey Ilinykh <ai...@gmail.com>.
> In my experience running repair on some counter data, the size of
> streamed data is much bigger than the cluster could possibly have lost
> messages or would be due to snapshotting at different times.
>
> I know the data will eventually be in sync on every repair, but I'm
> more interested in whether Cassandra transfers excess data and how to
> minimize this.
>
> Does any body have insights on this?
>
The problem is in granularity of Merkle tree. Cassandra sends regions
which have different hash values. It could be much bigger then a
single row.

Andrey