You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bryce Godfrey <Br...@azaleos.com> on 2012/10/25 19:44:55 UTC

High bandwidth usage between datacenters for cluster

We have a 5 node cluster, with a matching 5 nodes for DR in another data center.   With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN?
Normally this isn't an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers.

If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication?  We are on 1.1.5

Thanks

Re: High bandwidth usage between datacenters for cluster

Posted by "B. Todd Burruss" <bt...@gmail.com>.
bryce, did you resolve this?  i'm interested in the outcome.

when you write does it help to use CL = LOCAL_QUORUM?

On Mon, Oct 29, 2012 at 12:52 AM, aaron morton <aa...@thelastpickle.com> wrote:
> Outbound messages for other DC's are grouped and a single instance is sent
> to a single node in the remote DC. The remote node then forwards the message
> on to the other recipients in it's DC. All remote DC nodes will however
> reply directly to the coordinator.
>
> Normally this isn’t an issue for us, but at times we are writing
> approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic
> across the WAN to all the Cassandra DR servers.
>
> Can you break the traffic down by port and direction ?
>
> Cheers
>
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/10/2012, at 12:18 PM, Bryce Godfrey <Br...@azaleos.com> wrote:
>
> Network topology with the topology file filled out is already the
> configuration we are using.
>
> From: sankalp kohli [mailto:kohlisankalp@gmail.com]
> Sent: Thursday, October 25, 2012 11:55 AM
> To: user@cassandra.apache.org
> Subject: Re: High bandwidth usage between datacenters for cluster
>
> Use placement_strategy =
> 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the
> topology.properties file. This will tell cassandra that you have two DCs.
> You can verify that by looking at output of the ring command.
>
> If you DCs are setup properly, only one request will go over WAN. Though the
> responses from all nodes in other DC will go over WAN.
>
> On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey <Br...@azaleos.com>
> wrote:
>
> We have a 5 node cluster, with a matching 5 nodes for DR in another data
> center.   With a replication factor of 3, does the node I send a write too
> attempt to send it to the 3 servers in the DR also?  Or does it send it to 1
> and let it replicate locally in the DR environment to save bandwidth across
> the WAN?
> Normally this isn’t an issue for us, but at times we are writing
> approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic
> across the WAN to all the Cassandra DR servers.
>
> If my assumptions are right, is this configurable somehow for writing to one
> node and letting it do local replication?  We are on 1.1.5
>
> Thanks
>
>

Re: High bandwidth usage between datacenters for cluster

Posted by aaron morton <aa...@thelastpickle.com>.
Outbound messages for other DC's are grouped and a single instance is sent to a single node in the remote DC. The remote node then forwards the message on to the other recipients in it's DC. All remote DC nodes will however reply directly to the coordinator.

> Normally this isn’t an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers.

Can you break the traffic down by port and direction ?

Cheers



-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/10/2012, at 12:18 PM, Bryce Godfrey <Br...@azaleos.com> wrote:

> Network topology with the topology file filled out is already the configuration we are using. 
>  
> From: sankalp kohli [mailto:kohlisankalp@gmail.com] 
> Sent: Thursday, October 25, 2012 11:55 AM
> To: user@cassandra.apache.org
> Subject: Re: High bandwidth usage between datacenters for cluster
>  
> Use placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the topology.properties file. This will tell cassandra that you have two DCs. You can verify that by looking at output of the ring command.  
>  
> If you DCs are setup properly, only one request will go over WAN. Though the responses from all nodes in other DC will go over WAN. 
>  
> On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey <Br...@azaleos.com> wrote:
> We have a 5 node cluster, with a matching 5 nodes for DR in another data center.   With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN?
> Normally this isn’t an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers.
>  
> If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication?  We are on 1.1.5
>  
> Thanks


RE: High bandwidth usage between datacenters for cluster

Posted by Bryce Godfrey <Br...@azaleos.com>.
Network topology with the topology file filled out is already the configuration we are using.

From: sankalp kohli [mailto:kohlisankalp@gmail.com]
Sent: Thursday, October 25, 2012 11:55 AM
To: user@cassandra.apache.org
Subject: Re: High bandwidth usage between datacenters for cluster

Use placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the topology.properties file. This will tell cassandra that you have two DCs. You can verify that by looking at output of the ring command.

If you DCs are setup properly, only one request will go over WAN. Though the responses from all nodes in other DC will go over WAN.

On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey <Br...@azaleos.com>> wrote:
We have a 5 node cluster, with a matching 5 nodes for DR in another data center.   With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN?
Normally this isn't an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers.

If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication?  We are on 1.1.5

Thanks


Re: High bandwidth usage between datacenters for cluster

Posted by sankalp kohli <ko...@gmail.com>.
Use placement_strategy =
'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the
topology.properties file. This will tell cassandra that you have two DCs.
You can verify that by looking at output of the ring command.

If you DCs are setup properly, only one request will go over WAN. Though
the responses from all nodes in other DC will go over WAN.

On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey
<Br...@azaleos.com>wrote:

>  We have a 5 node cluster, with a matching 5 nodes for DR in another data
> center.   With a replication factor of 3, does the node I send a write too
> attempt to send it to the 3 servers in the DR also?  Or does it send it to
> 1 and let it replicate locally in the DR environment to save bandwidth
> across the WAN?****
>
> Normally this isn’t an issue for us, but at times we are writing
> approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic
> across the WAN to all the Cassandra DR servers.****
>
> ** **
>
> If my assumptions are right, is this configurable somehow for writing to
> one node and letting it do local replication?  We are on 1.1.5****
>
> ** **
>
> Thanks****
>

Re: High bandwidth usage between datacenters for cluster

Posted by "Hiller, Dean" <De...@nrel.gov>.
Use the datacenter replication strategy and try it with that so you tell cassandra all your data centers, racks, etc.

Dean

From: Bryce Godfrey <Br...@azaleos.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, October 25, 2012 11:44 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: High bandwidth usage between datacenters for cluster

We have a 5 node cluster, with a matching 5 nodes for DR in another data center.   With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also?  Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN?
Normally this isn’t an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers.

If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication?  We are on 1.1.5

Thanks