You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com> on 2014/06/25 16:43:58 UTC

repair takes 10x more time in one DC compared to the other

Hello,

I'm running repair on a large CF with the "--local" flag in 2 different
DCs. In one of the DCs the operation takes about 1 hour per node, while in
the other it takes 10 hours per node.

I would expect the times to differ, but not so much. The writes on that CF
all come from the DC where it takes 10 hours per node, could this be the
cause why it takes so long on this DC?

Additional info: C* 1.2.16, both DCs have the same replication factor.

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: repair takes 10x more time in one DC compared to the other

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Thu, Jun 26, 2014 at 4:06 AM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

>
> [...] since you may want to repair nodes sequentially in the local DC
> (-local) without re-repairing ranges of neighbor nodes (-pr).
>

Nobody disagrees that this would nice to have, we're just saying that this
currently doesn't work and so we disallow it for now so people like you
don't get bitten. If you have a patch to fix it ready, please do feel free
to contribute.

--
Sylvain


>
>
> On Wed, Jun 25, 2014 at 1:48 PM, Sylvain Lebresne <sy...@datastax.com>
> wrote:
>
>> I see. Well, you shouldn't use both "-local" and "-pr" together, they
>> don't make sense together. Which is the reason why their combination will
>> be rejected in 2.0.9 (you can check
>> https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
>> Basically, the result of using both is that lots of stuffs don't get
>> repaired.
>>
>>
>> On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes <
>> paulo.motta@chaordicsystems.com> wrote:
>>
>>> Thanks for the explanation, but I got slightly confused:
>>>
>>> From my understanding, you just described the behavior of the
>>> -pr/--partitioner-range option: "Repair only the first range returned by
>>> the partitioner for the node." , so I would understand that repairs in the
>>> same CFs in different DCs with only the -pr option could take different
>>> times.
>>>
>>> However according to the description of the -local/--in-local-dc option,
>>> it "only repairs against nodes in the same data center", but you said that "the
>>> range will be repaired for all replica in all data-centers", even with the
>>> "-local" option, or did you confuse it with "-pr" option?
>>>
>>> In any case, I'm using both "-local" and "-pr" options, what is the
>>> expected behavior in that case?
>>>
>>> Cheers,
>>>
>>>
>>>
>>> On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne <sylvain@datastax.com
>>> > wrote:
>>>
>>>> TL;DR, this is not unexpected and this is perfectly fine.
>>>>
>>>> For every node, 'repair --local' will repair the "primary" (where
>>>> primary means "the first range on the ring picked by the consistent hashing
>>>> for this node given its token", nothing more) range of the node in the
>>>> ring. And that range will be repaired for all replica in all data-centers.
>>>> When you assign tokens to multiple DC, it's actually pretty common to
>>>> offset the tokens of one DC slightly compared to the other one. This will
>>>> result in the "primary" ranges being always small in one DC but not the
>>>> other. But please note that this perfectly ok, it does not imply any
>>>> imbalance in data-centers. It also don't really mean that the node of one
>>>> DC actually do a lot more work than the other ones: all nodes most likely
>>>> contribute roughly the same amount of work to the repair. It only mean that
>>>> the nodes of one DC "coordinate" more repair work that those of the other
>>>> DC. Which is not really a big deal since coordinating a repair is cheap.
>>>>
>>>> --
>>>> Sylvain
>>>>
>>>>
>>>> On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
>>>> paulo.motta@chaordicsystems.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm running repair on a large CF with the "--local" flag in 2
>>>>> different DCs. In one of the DCs the operation takes about 1 hour per node,
>>>>> while in the other it takes 10 hours per node.
>>>>>
>>>>> I would expect the times to differ, but not so much. The writes on
>>>>> that CF all come from the DC where it takes 10 hours per node, could this
>>>>> be the cause why it takes so long on this DC?
>>>>>
>>>>> Additional info: C* 1.2.16, both DCs have the same replication factor.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> --
>>>>> *Paulo Motta*
>>>>>
>>>>> Chaordic | *Platform*
>>>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>>>> +55 48 3232.3200
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Re: repair takes 10x more time in one DC compared to the other

Posted by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com>.
Hmm.. good to find out, thanks for the reference! This explains the time
differences between repairs in different DCs.

But I think using -local and -pr should still be supported simultaneously,
since you may want to repair nodes sequentially in the local DC (-local)
without re-repairing ranges of neighbor nodes (-pr).


On Wed, Jun 25, 2014 at 1:48 PM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> I see. Well, you shouldn't use both "-local" and "-pr" together, they
> don't make sense together. Which is the reason why their combination will
> be rejected in 2.0.9 (you can check
> https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
> Basically, the result of using both is that lots of stuffs don't get
> repaired.
>
>
> On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes <
> paulo.motta@chaordicsystems.com> wrote:
>
>> Thanks for the explanation, but I got slightly confused:
>>
>> From my understanding, you just described the behavior of the
>> -pr/--partitioner-range option: "Repair only the first range returned by
>> the partitioner for the node." , so I would understand that repairs in the
>> same CFs in different DCs with only the -pr option could take different
>> times.
>>
>> However according to the description of the -local/--in-local-dc option,
>> it "only repairs against nodes in the same data center", but you said that "the
>> range will be repaired for all replica in all data-centers", even with the
>> "-local" option, or did you confuse it with "-pr" option?
>>
>> In any case, I'm using both "-local" and "-pr" options, what is the
>> expected behavior in that case?
>>
>> Cheers,
>>
>>
>>
>> On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne <sy...@datastax.com>
>> wrote:
>>
>>> TL;DR, this is not unexpected and this is perfectly fine.
>>>
>>> For every node, 'repair --local' will repair the "primary" (where
>>> primary means "the first range on the ring picked by the consistent hashing
>>> for this node given its token", nothing more) range of the node in the
>>> ring. And that range will be repaired for all replica in all data-centers.
>>> When you assign tokens to multiple DC, it's actually pretty common to
>>> offset the tokens of one DC slightly compared to the other one. This will
>>> result in the "primary" ranges being always small in one DC but not the
>>> other. But please note that this perfectly ok, it does not imply any
>>> imbalance in data-centers. It also don't really mean that the node of one
>>> DC actually do a lot more work than the other ones: all nodes most likely
>>> contribute roughly the same amount of work to the repair. It only mean that
>>> the nodes of one DC "coordinate" more repair work that those of the other
>>> DC. Which is not really a big deal since coordinating a repair is cheap.
>>>
>>> --
>>> Sylvain
>>>
>>>
>>> On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
>>> paulo.motta@chaordicsystems.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm running repair on a large CF with the "--local" flag in 2 different
>>>> DCs. In one of the DCs the operation takes about 1 hour per node, while in
>>>> the other it takes 10 hours per node.
>>>>
>>>> I would expect the times to differ, but not so much. The writes on that
>>>> CF all come from the DC where it takes 10 hours per node, could this be the
>>>> cause why it takes so long on this DC?
>>>>
>>>> Additional info: C* 1.2.16, both DCs have the same replication factor.
>>>>
>>>> Cheers,
>>>>
>>>> --
>>>> *Paulo Motta*
>>>>
>>>> Chaordic | *Platform*
>>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>>> +55 48 3232.3200
>>>>
>>>
>>>
>>
>>
>> --
>> *Paulo Motta*
>>
>> Chaordic | *Platform*
>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>> +55 48 3232.3200
>>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: repair takes 10x more time in one DC compared to the other

Posted by Sylvain Lebresne <sy...@datastax.com>.
I see. Well, you shouldn't use both "-local" and "-pr" together, they don't
make sense together. Which is the reason why their combination will be
rejected in 2.0.9 (you can check
https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
Basically, the result of using both is that lots of stuffs don't get
repaired.


On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> Thanks for the explanation, but I got slightly confused:
>
> From my understanding, you just described the behavior of the
> -pr/--partitioner-range option: "Repair only the first range returned by
> the partitioner for the node." , so I would understand that repairs in the
> same CFs in different DCs with only the -pr option could take different
> times.
>
> However according to the description of the -local/--in-local-dc option,
> it "only repairs against nodes in the same data center", but you said that "the
> range will be repaired for all replica in all data-centers", even with the
> "-local" option, or did you confuse it with "-pr" option?
>
> In any case, I'm using both "-local" and "-pr" options, what is the
> expected behavior in that case?
>
> Cheers,
>
>
>
> On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne <sy...@datastax.com>
> wrote:
>
>> TL;DR, this is not unexpected and this is perfectly fine.
>>
>> For every node, 'repair --local' will repair the "primary" (where primary
>> means "the first range on the ring picked by the consistent hashing for
>> this node given its token", nothing more) range of the node in the ring.
>> And that range will be repaired for all replica in all data-centers. When
>> you assign tokens to multiple DC, it's actually pretty common to offset the
>> tokens of one DC slightly compared to the other one. This will result in
>> the "primary" ranges being always small in one DC but not the other. But
>> please note that this perfectly ok, it does not imply any imbalance in
>> data-centers. It also don't really mean that the node of one DC actually do
>> a lot more work than the other ones: all nodes most likely contribute
>> roughly the same amount of work to the repair. It only mean that the nodes
>> of one DC "coordinate" more repair work that those of the other DC. Which
>> is not really a big deal since coordinating a repair is cheap.
>>
>> --
>> Sylvain
>>
>>
>> On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
>> paulo.motta@chaordicsystems.com> wrote:
>>
>>> Hello,
>>>
>>> I'm running repair on a large CF with the "--local" flag in 2 different
>>> DCs. In one of the DCs the operation takes about 1 hour per node, while in
>>> the other it takes 10 hours per node.
>>>
>>> I would expect the times to differ, but not so much. The writes on that
>>> CF all come from the DC where it takes 10 hours per node, could this be the
>>> cause why it takes so long on this DC?
>>>
>>> Additional info: C* 1.2.16, both DCs have the same replication factor.
>>>
>>> Cheers,
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Re: repair takes 10x more time in one DC compared to the other

Posted by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com>.
Thanks for the explanation, but I got slightly confused:

>From my understanding, you just described the behavior of the
-pr/--partitioner-range option: "Repair only the first range returned by
the partitioner for the node." , so I would understand that repairs in the
same CFs in different DCs with only the -pr option could take different
times.

However according to the description of the -local/--in-local-dc option, it
"only repairs against nodes in the same data center", but you said that "the
range will be repaired for all replica in all data-centers", even with the
"-local" option, or did you confuse it with "-pr" option?

In any case, I'm using both "-local" and "-pr" options, what is the
expected behavior in that case?

Cheers,



On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> TL;DR, this is not unexpected and this is perfectly fine.
>
> For every node, 'repair --local' will repair the "primary" (where primary
> means "the first range on the ring picked by the consistent hashing for
> this node given its token", nothing more) range of the node in the ring.
> And that range will be repaired for all replica in all data-centers. When
> you assign tokens to multiple DC, it's actually pretty common to offset the
> tokens of one DC slightly compared to the other one. This will result in
> the "primary" ranges being always small in one DC but not the other. But
> please note that this perfectly ok, it does not imply any imbalance in
> data-centers. It also don't really mean that the node of one DC actually do
> a lot more work than the other ones: all nodes most likely contribute
> roughly the same amount of work to the repair. It only mean that the nodes
> of one DC "coordinate" more repair work that those of the other DC. Which
> is not really a big deal since coordinating a repair is cheap.
>
> --
> Sylvain
>
>
> On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
> paulo.motta@chaordicsystems.com> wrote:
>
>> Hello,
>>
>> I'm running repair on a large CF with the "--local" flag in 2 different
>> DCs. In one of the DCs the operation takes about 1 hour per node, while in
>> the other it takes 10 hours per node.
>>
>> I would expect the times to differ, but not so much. The writes on that
>> CF all come from the DC where it takes 10 hours per node, could this be the
>> cause why it takes so long on this DC?
>>
>> Additional info: C* 1.2.16, both DCs have the same replication factor.
>>
>> Cheers,
>>
>> --
>> *Paulo Motta*
>>
>> Chaordic | *Platform*
>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>> +55 48 3232.3200
>>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: repair takes 10x more time in one DC compared to the other

Posted by Sylvain Lebresne <sy...@datastax.com>.
TL;DR, this is not unexpected and this is perfectly fine.

For every node, 'repair --local' will repair the "primary" (where primary
means "the first range on the ring picked by the consistent hashing for
this node given its token", nothing more) range of the node in the ring.
And that range will be repaired for all replica in all data-centers. When
you assign tokens to multiple DC, it's actually pretty common to offset the
tokens of one DC slightly compared to the other one. This will result in
the "primary" ranges being always small in one DC but not the other. But
please note that this perfectly ok, it does not imply any imbalance in
data-centers. It also don't really mean that the node of one DC actually do
a lot more work than the other ones: all nodes most likely contribute
roughly the same amount of work to the repair. It only mean that the nodes
of one DC "coordinate" more repair work that those of the other DC. Which
is not really a big deal since coordinating a repair is cheap.

--
Sylvain


On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> Hello,
>
> I'm running repair on a large CF with the "--local" flag in 2 different
> DCs. In one of the DCs the operation takes about 1 hour per node, while in
> the other it takes 10 hours per node.
>
> I would expect the times to differ, but not so much. The writes on that CF
> all come from the DC where it takes 10 hours per node, could this be the
> cause why it takes so long on this DC?
>
> Additional info: C* 1.2.16, both DCs have the same replication factor.
>
> Cheers,
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>