You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Erik Holstad <er...@gmail.com> on 2010/03/31 02:49:17 UTC

Replicating data over the wan?

Is anyone using datacenter aware replication where the replication takes
place over the wan
and not over super fast optical cable between the centers?

Tried to look at all posts related to the topic but haven't really found too
much, only some things
about not doing that if using ZooKeeper and some other small comments.

What are the limitations for this kind of replication, what happens when the
writes are coming
in too fast for the replication to be done, assuming some kind of write
buffer for the replication?

Not really worried too much about the inconsistent state of the cluster,
just how well it would
work is this kind of environment.

-- 
Regards Erik

Re: Replicating data over the wan?

Posted by David Strauss <da...@fourkitchens.com>.
On 2010-03-31 01:42, Erik Holstad wrote:
> I'm not too worried about inconsistency in data too much more if things
> like the gossip protocol would saturate the wan and things like that.

I haven't tried inter-DC replication, but I would be surprised if gossip
saturated a line with any decent bandwidth.

> In that case is the replication to the other datacenter be treated as
> that node is down and you start replicating to a different note using hinted hand off?

Assuming you're using RF >= 2 and a RackAware strategy, there should be
a node to properly write to for any row in either DC without need for
hinted hand-off. Writes should then succeed if you use SL.ONE, even with
a disconnect between DCs.

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


Re: Replicating data over the wan?

Posted by David Timothy Strauss <da...@fourkitchens.com>.
Your ConsistencyLevel will change the effect. If CL is low, inconsistency will temporarily occur between the DCs. If CL is high, writes will have noticeably high latency.

-----Original Message-----
From: Erik Holstad <er...@gmail.com>
Date: Tue, 30 Mar 2010 17:49:17 
To: <us...@cassandra.apache.org>
Subject: Replicating data over the wan?

Is anyone using datacenter aware replication where the replication takes
place over the wan
and not over super fast optical cable between the centers?

Tried to look at all posts related to the topic but haven't really found too
much, only some things
about not doing that if using ZooKeeper and some other small comments.

What are the limitations for this kind of replication, what happens when the
writes are coming
in too fast for the replication to be done, assuming some kind of write
buffer for the replication?

Not really worried too much about the inconsistent state of the cluster,
just how well it would
work is this kind of environment.

-- 
Regards Erik


Re: Replicating data over the wan?

Posted by Avinash Lakshman <av...@gmail.com>.
How far apart are the data centers? Technically there will be an increase in
latency for the writes if you are waiting for acks from the replicas. How
long does it for simple pings between machines in these data centers? If
inconsistency is not an issue you can mitigate this by doing asynchronous
replication by waiting for a successful  response from just one of the
replicas. Apart from that I can't think of anything that is fundamentally an
impediment. We replicate across east and west coast and typically a ping
takes 75 ms.


On Tue, Mar 30, 2010 at 5:49 PM, Erik Holstad <er...@gmail.com> wrote:

> Is anyone using datacenter aware replication where the replication takes
> place over the wan
> and not over super fast optical cable between the centers?
>
> Tried to look at all posts related to the topic but haven't really found
> too much, only some things
> about not doing that if using ZooKeeper and some other small comments.
>
> What are the limitations for this kind of replication, what happens when
> the writes are coming
> in too fast for the replication to be done, assuming some kind of write
> buffer for the replication?
>
> Not really worried too much about the inconsistent state of the cluster,
> just how well it would
> work is this kind of environment.
>
> --
> Regards Erik
>

Re: Replicating data over the wan?

Posted by Erik Holstad <er...@gmail.com>.
Thanks David and Jonathan for the info.

Those two links were pretty much the only thing that I did find about this
issue, but is wasn't
sure that only because it works for different zones it would also work for
different regions.


-- 
Regards Erik

Re: Replicating data over the wan?

Posted by Jonathan Ellis <jb...@gmail.com>.
http://permalink.gmane.org/gmane.comp.db.cassandra.user/3462
http://permalink.gmane.org/gmane.comp.db.cassandra.user/3483

On Tue, Mar 30, 2010 at 7:49 PM, Erik Holstad <er...@gmail.com> wrote:
> Is anyone using datacenter aware replication where the replication takes
> place over the wan
> and not over super fast optical cable between the centers?
>
> Tried to look at all posts related to the topic but haven't really found too
> much, only some things
> about not doing that if using ZooKeeper and some other small comments.
>
> What are the limitations for this kind of replication, what happens when the
> writes are coming
> in too fast for the replication to be done, assuming some kind of write
> buffer for the replication?
>
> Not really worried too much about the inconsistent state of the cluster,
> just how well it would
> work is this kind of environment.
>
> --
> Regards Erik
>