You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Mohapatra, Kishore" <Ki...@nuance.com> on 2017/09/15 16:09:47 UTC

Cassandra repair process in Low Bandwidth Network

Hi,
       we have a cassandra cluster with 7 nodes each in 3 datacenters. We are using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very poor and has only 100 MBit/s and also goes thru VPN network. Each node contains about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails with streaming errors and never completes. I have already tried the streaming timeout parameter to a very high value. But it did not help. I could repair either just in the local dc or just the first two DCs. Can not repair DC3 when i combine with the other two DCs.

So how can i successfully repair the keyspace in these kind of environments ?

I see that there is a parameter to throttle the inter-dc stream thruput, which default to 200 MBit/s. So what is the minimum threshold that i could set it to without affecting the cluster ?

Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.


Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapatra@nuance.com<ma...@nuance.com>

RE: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Posted by "Mohapatra, Kishore" <Ki...@nuance.com>.

Hi Jeff,
                      Thanks for your reply.
Infact I have tried with all the options.

  1.  We use Cassandra reaper for our repair, which does the sub range repair.
  2.  I have also developed a shell script, which exactly does the same, as what reaper does. But this can control, how many repair session will run concurrently.
  3.  Also tried with full repair.
  4.  Tried running repair in two DCs at a time. While the repair between DC1 And DC2 goes fine, but repair between DC1 and DC3 or between DC2 and DC3 fails.

So I will try setting inter-dc stream thruput to 20Mbps and see how that goes.

Is there anything else that could be done in this case ?

Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapatra@nuance.com<ma...@nuance.com>


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, September 15, 2017 10:27 AM
To: cassandra <us...@cassandra.apache.org>
Subject: [EXTERNAL] Re: Cassandra repair process in Low Bandwidth Network

Hi Kishore,

Just to make sure we're all on the same page, I presume you're doing full repairs using something like 'nodetool repair -pr', which repairs all data for a given token range across all of your hosts in all of your dcs. Is that a correct assumption to start?

In addition to throttling inter-dc stream throughput (which you should be able to set quite low - perhaps as low as 20 Mbps), you may also want to consider smaller ranges (using a concept we call subrange repair, where instead of using -pr, you pass -st and -et - which is what tools like http://cassandra-reaper.io/<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra-2Dreaper.io_&d=DwMFaQ&c=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY&r=O20_rcIS1QazTO3_J10I1cPIygxnuBZ4sUCz1TS16XE&m=rNdSqNv4gpfoluDbS5uGdjDRj6zcJVHGYOSaJyl7FmQ&s=SiggeMxLLmJXEW7ljC48Lap4qov05ZvEuRJ_ybaxffI&e=> do ) - this will keep streams smaller (in terms of total bytes transferred per streaming session, though you'll have more sessions). Finally, you can use -host and -dc options to limit repair so that sessions don't always hit all 3 dcs - for exactly, you could do a repair between DC1 and DC2 using -dc, then do a repair of DC1 and DC3 using -dc - it's a lot more coordination required, but likely helps cut down on the traffic over your VPN link.


On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore <Ki...@nuance.com>> wrote:

Hi,
       we have a cassandra cluster with 7 nodes each in 3 datacenters. We are using C* 2.1.15.4 version.
Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a dedicated one. However network pipe between DC1 and DC3 and between DC2 and DC3 is very poor and has only 100 MBit/s and also goes thru VPN network. Each node contains about 100 Gb of data and has a RF of 3. Whenever we run the repair, it fails with streaming errors and never completes. I have already tried the streaming timeout parameter to a very high value. But it did not help. I could repair either just in the local dc or just the first two DCs. Can not repair DC3 when i combine with the other two DCs.

So how can i successfully repair the keyspace in these kind of environments ?

I see that there is a parameter to throttle the inter-dc stream thruput, which default to 200 MBit/s. So what is the minimum threshold that i could set it to without affecting the cluster ?

Is there any other way to work in these kind of environments ?
I will appreciate your feedback and help on this.


Thanks

Kishore Mohapatra
Principal Operations DBA
Seattle, WA
Email : kishore.mohapatra@nuance.com<ma...@nuance.com>

Re: Cassandra repair process in Low Bandwidth Network

Posted by Jeff Jirsa <jj...@gmail.com>.

Hi Kishore,

Just to make sure we're all on the same page, I presume you're doing full
repairs using something like 'nodetool repair -pr', which repairs all data
for a given token range across all of your hosts in all of your dcs. Is
that a correct assumption to start?

In addition to throttling inter-dc stream throughput (which you should be
able to set quite low - perhaps as low as 20 Mbps), you may also want to
consider smaller ranges (using a concept we call subrange repair, where
instead of using -pr, you pass -st and -et - which is what tools like
http://cassandra-reaper.io/ do ) - this will keep streams smaller (in terms
of total bytes transferred per streaming session, though you'll have more
sessions). Finally, you can use -host and -dc options to limit repair so
that sessions don't always hit all 3 dcs - for exactly, you could do a
repair between DC1 and DC2 using -dc, then do a repair of DC1 and DC3 using
-dc - it's a lot more coordination required, but likely helps cut down on
the traffic over your VPN link.



On Fri, Sep 15, 2017 at 9:09 AM, Mohapatra, Kishore <
Kishore.Mohapatra@nuance.com> wrote:

> Hi,
>        we have a cassandra cluster with 7 nodes each in 3 datacenters. We
> are using C* 2.1.15.4 version.
> Network bandwidth between DC1 and DC2 is very good (10Gbit/s) and a
> dedicated one. However network pipe between DC1 and DC3 and between DC2 and
> DC3 is very poor and has only 100 MBit/s and also goes thru VPN network.
> Each node contains about 100 Gb of data and has a RF of 3. Whenever we run
> the repair, it fails with streaming errors and never completes. I have
> already tried the streaming timeout parameter to a very high value. But it
> did not help. I could repair either just in the local dc or just the first
> two DCs. Can not repair DC3 when i combine with the other two DCs.
>
> So how can i successfully repair the keyspace in these kind of
> environments ?
>
> I see that there is a parameter to throttle the inter-dc stream thruput,
> which default to 200 MBit/s. So what is the minimum threshold that i could
> set it to without affecting the cluster ?
>
> Is there any other way to work in these kind of environments ?
> I will appreciate your feedback and help on this.
>
>
>
>
>
> Thanks
>
>
>
> *Kishore Mohapatra*
>
> Principal Operations DBA
>
> Seattle, WA
>
> Email : kishore.mohapatra@nuance.com
>
>
>
>
>