You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Héctor Izquierdo Seliva <iz...@strands.com> on 2011/02/21 15:10:15 UTC

Replicate changes from DC1 to DC2, but not from DC2 to DC1

Hi all.

Is there a way (besides changing the code) to replicate data from a Data
center 1 to a Data center 2, but not the other way around? I need to
have a preproduction environment with production data, and ideally with
only a fraction of the data (for example, by key preffixes). I have
poked around StorageProxy and I can make writes in DC2 not replicate to
DC1, and as long as I use DC_QUORUM it stays that way, but it
looks...dangerous. I could do a full key scan but it would take too
long.

Have anybody done something similar?

Thanks!



Re: Replicate changes from DC1 to DC2, but not from DC2 to DC1

Posted by Jonathan Ellis <jb...@gmail.com>.
That would cause a lot of subtle breakage, e.g. confusing Repair.

2011/2/22 Héctor Izquierdo Seliva <iz...@strands.com>:
> El mar, 22-02-2011 a las 08:46 +1300, Aaron Morton escribió:
>> Take a look at the NetworkTopologyStrategy and/or the RackInferringSnitch together they  decide where to place replicas. It's probably not a great idea to muck around with this stuff though.
>>
>> How about a hadoop job to pull out the data you want? It would be a full scan but in parallel.
>
> I looked at that, but correct me If i'm wrong, schema changes are
> distributed to all nodes, and they all have to agree on a version, so I
> can't have a keyspace A in DC1 with NetworkTopologyStrategy with options
> = [{DC1:1,DC2:1}] and the same keyspace in DC2 with options [{DC2:1,
> DC1:0}]. Is that correct?
>
>> Aaron
>>
>> On 22/02/2011, at 3:10 AM, Héctor Izquierdo Seliva <iz...@strands.com> wrote:
>>
>> >
>> > Hi all.
>> >
>> > Is there a way (besides changing the code) to replicate data from a Data
>> > center 1 to a Data center 2, but not the other way around? I need to
>> > have a preproduction environment with production data, and ideally with
>> > only a fraction of the data (for example, by key preffixes). I have
>> > poked around StorageProxy and I can make writes in DC2 not replicate to
>> > DC1, and as long as I use DC_QUORUM it stays that way, but it
>> > looks...dangerous. I could do a full key scan but it would take too
>> > long.
>> >
>> > Have anybody done something similar?
>> >
>> > Thanks!
>> >
>> >
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Replicate changes from DC1 to DC2, but not from DC2 to DC1

Posted by Héctor Izquierdo Seliva <iz...@strands.com>.
El mar, 22-02-2011 a las 08:46 +1300, Aaron Morton escribió:
> Take a look at the NetworkTopologyStrategy and/or the RackInferringSnitch together they  decide where to place replicas. It's probably not a great idea to muck around with this stuff though.
> 
> How about a hadoop job to pull out the data you want? It would be a full scan but in parallel.

I looked at that, but correct me If i'm wrong, schema changes are
distributed to all nodes, and they all have to agree on a version, so I
can't have a keyspace A in DC1 with NetworkTopologyStrategy with options
= [{DC1:1,DC2:1}] and the same keyspace in DC2 with options [{DC2:1,
DC1:0}]. Is that correct?

> Aaron
> 
> On 22/02/2011, at 3:10 AM, Héctor Izquierdo Seliva <iz...@strands.com> wrote:
> 
> > 
> > Hi all.
> > 
> > Is there a way (besides changing the code) to replicate data from a Data
> > center 1 to a Data center 2, but not the other way around? I need to
> > have a preproduction environment with production data, and ideally with
> > only a fraction of the data (for example, by key preffixes). I have
> > poked around StorageProxy and I can make writes in DC2 not replicate to
> > DC1, and as long as I use DC_QUORUM it stays that way, but it
> > looks...dangerous. I could do a full key scan but it would take too
> > long.
> > 
> > Have anybody done something similar?
> > 
> > Thanks!
> > 
> > 



Re: Replicate changes from DC1 to DC2, but not from DC2 to DC1

Posted by Aaron Morton <aa...@thelastpickle.com>.
Take a look at the NetworkTopologyStrategy and/or the RackInferringSnitch together they  decide where to place replicas. It's probably not a great idea to muck around with this stuff though.

How about a hadoop job to pull out the data you want? It would be a full scan but in parallel.

Aaron

On 22/02/2011, at 3:10 AM, Héctor Izquierdo Seliva <iz...@strands.com> wrote:

> 
> Hi all.
> 
> Is there a way (besides changing the code) to replicate data from a Data
> center 1 to a Data center 2, but not the other way around? I need to
> have a preproduction environment with production data, and ideally with
> only a fraction of the data (for example, by key preffixes). I have
> poked around StorageProxy and I can make writes in DC2 not replicate to
> DC1, and as long as I use DC_QUORUM it stays that way, but it
> looks...dangerous. I could do a full key scan but it would take too
> long.
> 
> Have anybody done something similar?
> 
> Thanks!
> 
>