You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Gábor Auth <au...@gmail.com> on 2017/03/15 09:50:22 UTC

Slow repair

Hi,

We are working with a two DCs Cassandra cluster (EU and US), so that the
distance is over 160 ms between them. I've added a new DC to this cluster,
modified the keyspace's replication factor and trying to rebalance it with
repair but the repair is very slow (over 10-15 minutes per node per
keyspace with ~40 column families). Is it normal with this network latency
or something wrong with the cluster or the network connection? :)

[2017-03-15 05:52:38,255] Starting repair command #4, repairing keyspace
test20151222 with repair options (parallelism: parallel, primary range:
true, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters:
[], hosts: [], # of ranges: 32)
[2017-03-15 05:54:11,913] Repair session
988bd850-0943-11e7-9c1f-f5ba092c6aea for range
[(-3328182031191101706,-3263206086630594139],
(-449681117114180865,-426983008087217811],
(-4940101276128910421,-4726878962587262390],
(-4999008077542282524,-4940101276128910421]] finished (progress: 11%)
[2017-03-15 05:55:39,721] Repair session
9a6fda92-0943-11e7-9c1f-f5ba092c6aea for range
[(7538662821591320245,7564364667721298414],
(8095771383100385537,8112071444788258953],
(-1625703837190283897,-1600176580612824092],
(-1075557915997532230,-1072724867906442440], (-9152
563942239372475,-9123254980705325471],
(7485905313674392326,7513617239634230698]] finished (progress: 14%)
[2017-03-15 05:57:05,718] Repair session
9de181b1-0943-11e7-9c1f-f5ba092c6aea for range
[(-6471953894734787784,-6420063839816736750],
(1372322727565611879,1480899944406172322],
(1176263633569625668,1177285361971054591],
(440549646067640682,491840653569315468], (-43128299
75221321282,-4177428401237878410]] finished (progress: 17%)
[2017-03-15 05:58:39,997] Repair session
a18bc500-0943-11e7-9c1f-f5ba092c6aea for range
[(5327651902976749177,5359189884199963589],
(-5362946313988105342,-5348008210198062914],
(-5756557262823877856,-5652851311492822149],
(-5400778420101537991,-5362946313988105342], (668
2536072120412021,6904193483670147322]] finished (progress: 20%)
[2017-03-15 05:59:11,791] Repair session
a44f2ac2-0943-11e7-9c1f-f5ba092c6aea for range
[(952873612468870228,1042958763135655298],
(558544893991295379,572114658167804730]] finished (progress: 22%)
[2017-03-15 05:59:56,197] Repair session
a5e13c71-0943-11e7-9c1f-f5ba092c6aea for range
[(1914238614647876002,1961526714897144472],
(3610056520286573718,3619622957324752442],
(-3506227577233676363,-3504718440405535976],
(-4120686433235827731,-4098515820338981500], (56515
94158011135924,5668698324546997949]] finished (progress: 25%)
[2017-03-15 06:00:45,610] Repair session
a897a9e1-0943-11e7-9c1f-f5ba092c6aea for range
[(-9007733666337543056,-8979974976044921941]] finished (progress: 28%)
[2017-03-15 06:01:58,826] Repair session
a927b4e1-0943-11e7-9c1f-f5ba092c6aea for range
[(3599745202434925817,3608662806723095677],
(3390003128426746316,3391135639180043521],
(3391135639180043521,3529019003015169892]] finished (progress: 31%)
[2017-03-15 06:03:15,440] Repair session
aae06160-0943-11e7-9c1f-f5ba092c6aea for range
[(-7542303048667795773,-7300899534947316960]] finished (progress: 34%)
[2017-03-15 06:03:17,786] Repair completed successfully
[2017-03-15 06:03:17,787] Repair command #4 finished in 10 minutes 39
seconds

Bye,
Gábor Auth

Re: Slow repair

Posted by siddharth verma <si...@gmail.com>.
Hi,
We did a similar thing when a new DC was added and had to populate it
according to altered replication of keyspace.
For repair we used Tickler approach rather than actual nodetool repair.
(using the blocking read repair feature in cassandra)

You can see
1. Ticker by ckalantzis : https://github.com/ckalantzis/cassTickler
2. Modified tickler ( written in java ) :
https://github.com/siddv29/eleventh-hour-repair
clone, mvn clean install, and run it on any machine

Regards

On Wed, Mar 15, 2017 at 3:57 PM, Ben Slater <be...@instaclustr.com>
wrote:

> When you say you’re running repair to “rebalance” do you mean to populate
> the new DC? If so, the normal/correct procedure is to use nodetool rebuild
> rather than repair. See https://docs.datastax.com/
> en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html for
> the full details.
>
> Cheers
> Ben
>
> On Wed, 15 Mar 2017 at 21:14 Gábor Auth <au...@gmail.com> wrote:
>
>> Hi,
>>
>> We are working with a two DCs Cassandra cluster (EU and US), so that the
>> distance is over 160 ms between them. I've added a new DC to this cluster,
>> modified the keyspace's replication factor and trying to rebalance it with
>> repair but the repair is very slow (over 10-15 minutes per node per
>> keyspace with ~40 column families). Is it normal with this network latency
>> or something wrong with the cluster or the network connection? :)
>>
>> [2017-03-15 05:52:38,255] Starting repair command #4, repairing keyspace
>> test20151222 with repair options (parallelism: parallel, primary range:
>> true, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters:
>> [], hosts: [], # of ranges: 32)
>> [2017-03-15 05:54:11,913] Repair session 988bd850-0943-11e7-9c1f-f5ba092c6aea
>> for range [(-3328182031191101706,-3263206086630594139],
>> (-449681117114180865,-426983008087217811], (-4940101276128910421,-4726878962587262390],
>> (-4999008077542282524,-4940101276128910421]] finished (progress: 11%)
>> [2017-03-15 05:55:39,721] Repair session 9a6fda92-0943-11e7-9c1f-f5ba092c6aea
>> for range [(7538662821591320245,7564364667721298414],
>> (8095771383100385537,8112071444788258953], (-1625703837190283897,-1600176580612824092],
>> (-1075557915997532230,-1072724867906442440], (-9152
>> 563942239372475,-9123254980705325471], (7485905313674392326,7513617239634230698]]
>> finished (progress: 14%)
>> [2017-03-15 05:57:05,718] Repair session 9de181b1-0943-11e7-9c1f-f5ba092c6aea
>> for range [(-6471953894734787784,-6420063839816736750],
>> (1372322727565611879,1480899944406172322], (1176263633569625668,1177285361971054591],
>> (440549646067640682,491840653569315468], (-43128299
>> 75221321282,-4177428401237878410]] finished (progress: 17%)
>> [2017-03-15 05:58:39,997] Repair session a18bc500-0943-11e7-9c1f-f5ba092c6aea
>> for range [(5327651902976749177,5359189884199963589],
>> (-5362946313988105342,-5348008210198062914], (-5756557262823877856,-5652851311492822149],
>> (-5400778420101537991,-5362946313988105342], (668
>> 2536072120412021,6904193483670147322]] finished (progress: 20%)
>> [2017-03-15 05:59:11,791] Repair session a44f2ac2-0943-11e7-9c1f-f5ba092c6aea
>> for range [(952873612468870228,1042958763135655298], (558544893991295379,572114658167804730]]
>> finished (progress: 22%)
>> [2017-03-15 05:59:56,197] Repair session a5e13c71-0943-11e7-9c1f-f5ba092c6aea
>> for range [(1914238614647876002,1961526714897144472],
>> (3610056520286573718,3619622957324752442], (-3506227577233676363,-3504718440405535976],
>> (-4120686433235827731,-4098515820338981500], (56515
>> 94158011135924,5668698324546997949]] finished (progress: 25%)
>> [2017-03-15 06:00:45,610] Repair session a897a9e1-0943-11e7-9c1f-f5ba092c6aea
>> for range [(-9007733666337543056,-8979974976044921941]] finished
>> (progress: 28%)
>> [2017-03-15 06:01:58,826] Repair session a927b4e1-0943-11e7-9c1f-f5ba092c6aea
>> for range [(3599745202434925817,3608662806723095677],
>> (3390003128426746316,3391135639180043521], (3391135639180043521,3529019003015169892]]
>> finished (progress: 31%)
>> [2017-03-15 06:03:15,440] Repair session aae06160-0943-11e7-9c1f-f5ba092c6aea
>> for range [(-7542303048667795773,-7300899534947316960]] finished
>> (progress: 34%)
>> [2017-03-15 06:03:17,786] Repair completed successfully
>> [2017-03-15 06:03:17,787] Repair command #4 finished in 10 minutes 39
>> seconds
>>
>> Bye,
>> Gábor Auth
>>
>> --
>
>
> *Ben Slater*
>
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)

Re: Slow repair

Posted by Gábor Auth <au...@gmail.com>.
Hi,

On Wed, Mar 15, 2017 at 11:35 AM Ben Slater <be...@instaclustr.com>
wrote:

> When you say you’re running repair to “rebalance” do you mean to populate
> the new DC? If so, the normal/correct procedure is to use nodetool rebuild
> rather than repair.
>

Oh, thank you! :)

Bye,
Gábor Auth

>

Re: Slow repair

Posted by Ben Slater <be...@instaclustr.com>.
When you say you’re running repair to “rebalance” do you mean to populate
the new DC? If so, the normal/correct procedure is to use nodetool rebuild
rather than repair. See
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
for
the full details.

Cheers
Ben

On Wed, 15 Mar 2017 at 21:14 Gábor Auth <au...@gmail.com> wrote:

> Hi,
>
> We are working with a two DCs Cassandra cluster (EU and US), so that the
> distance is over 160 ms between them. I've added a new DC to this cluster,
> modified the keyspace's replication factor and trying to rebalance it with
> repair but the repair is very slow (over 10-15 minutes per node per
> keyspace with ~40 column families). Is it normal with this network latency
> or something wrong with the cluster or the network connection? :)
>
> [2017-03-15 05:52:38,255] Starting repair command #4, repairing keyspace
> test20151222 with repair options (parallelism: parallel, primary range:
> true, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters:
> [], hosts: [], # of ranges: 32)
> [2017-03-15 05:54:11,913] Repair session
> 988bd850-0943-11e7-9c1f-f5ba092c6aea for range
> [(-3328182031191101706,-3263206086630594139],
> (-449681117114180865,-426983008087217811],
> (-4940101276128910421,-4726878962587262390],
> (-4999008077542282524,-4940101276128910421]] finished (progress: 11%)
> [2017-03-15 05:55:39,721] Repair session
> 9a6fda92-0943-11e7-9c1f-f5ba092c6aea for range
> [(7538662821591320245,7564364667721298414],
> (8095771383100385537,8112071444788258953],
> (-1625703837190283897,-1600176580612824092],
> (-1075557915997532230,-1072724867906442440], (-9152
> 563942239372475,-9123254980705325471],
> (7485905313674392326,7513617239634230698]] finished (progress: 14%)
> [2017-03-15 05:57:05,718] Repair session
> 9de181b1-0943-11e7-9c1f-f5ba092c6aea for range
> [(-6471953894734787784,-6420063839816736750],
> (1372322727565611879,1480899944406172322],
> (1176263633569625668,1177285361971054591],
> (440549646067640682,491840653569315468], (-43128299
> 75221321282,-4177428401237878410]] finished (progress: 17%)
> [2017-03-15 05:58:39,997] Repair session
> a18bc500-0943-11e7-9c1f-f5ba092c6aea for range
> [(5327651902976749177,5359189884199963589],
> (-5362946313988105342,-5348008210198062914],
> (-5756557262823877856,-5652851311492822149],
> (-5400778420101537991,-5362946313988105342], (668
> 2536072120412021,6904193483670147322]] finished (progress: 20%)
> [2017-03-15 05:59:11,791] Repair session
> a44f2ac2-0943-11e7-9c1f-f5ba092c6aea for range
> [(952873612468870228,1042958763135655298],
> (558544893991295379,572114658167804730]] finished (progress: 22%)
> [2017-03-15 05:59:56,197] Repair session
> a5e13c71-0943-11e7-9c1f-f5ba092c6aea for range
> [(1914238614647876002,1961526714897144472],
> (3610056520286573718,3619622957324752442],
> (-3506227577233676363,-3504718440405535976],
> (-4120686433235827731,-4098515820338981500], (56515
> 94158011135924,5668698324546997949]] finished (progress: 25%)
> [2017-03-15 06:00:45,610] Repair session
> a897a9e1-0943-11e7-9c1f-f5ba092c6aea for range
> [(-9007733666337543056,-8979974976044921941]] finished (progress: 28%)
> [2017-03-15 06:01:58,826] Repair session
> a927b4e1-0943-11e7-9c1f-f5ba092c6aea for range
> [(3599745202434925817,3608662806723095677],
> (3390003128426746316,3391135639180043521],
> (3391135639180043521,3529019003015169892]] finished (progress: 31%)
> [2017-03-15 06:03:15,440] Repair session
> aae06160-0943-11e7-9c1f-f5ba092c6aea for range
> [(-7542303048667795773,-7300899534947316960]] finished (progress: 34%)
> [2017-03-15 06:03:17,786] Repair completed successfully
> [2017-03-15 06:03:17,787] Repair command #4 finished in 10 minutes 39
> seconds
>
> Bye,
> Gábor Auth
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.