You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Faraaz Sareshwala <fs...@quantcast.com> on 2013/06/19 19:50:46 UTC

Joining distinct clusters with the same schema together

My company is planning on deploying cassandra to three separate datacenters.
Each datacenter will have a cassandra cluster with a separate set of seeds
specific to that datacenter. However, the cluster name will be the same.

Question 1: is this enough to guarentee that the three datacenters will have
distinct cassandra clusters as well? Or will one node in datacenter A still
somehow be able to join datacenter B's ring.

Cassandra has cross datacenter replication and we plan to use that in the
future. For now, we are planning on using our own relay mechanism to transfer
data changes from one datacenter to another. Each cassandra cluster in each
datacenter will have the same keyspaces and column families with the same
schema. Datacenter A will send mutations over this relay to datacenter B which
will replay the mutation in cassandra.  Therefore, datacenter A's cassandra
cluster will look identical to datacenter B's cassandra cluster, but not through
the cross datacenter replication that cassandra offers.

Question 2: is this a sane strategy? We're trying to make the smallest possible
change when deploying cassandra. Our plan is to slowly move our infrastructure
over to relying more on cassandra once we can assess how it behaves with our
workload.

Question 3: eventually, we want to turn all these cassandra clusters into one
large multi-datacenter cluster. What's the best practice to do this? Should I
just add nodes from all datacenters to the list of seeds and let cassandra
resolve differences? Is there another way I don't know about?

Thank you,
Faraaz

Re: Joining distinct clusters with the same schema together

Posted by aaron morton <aa...@thelastpickle.com>.

> > Question 2: is this a sane strategy?
> 
> On its face my answer is "not... really"? 
I'd go with a solid no. 

Just because the the three independent clusters have a schema that looks the same does not make them the same. The schema is a versioned document, you will not be able to merge them by merging the DC's later without downtime. 

It will be easier to go with a multi DC setup from the start. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/06/2013, at 6:36 AM, Eric Stevens <mi...@gmail.com> wrote:

> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
> 
> Doing replication manually sounds like a recipe for the DC's eventually getting subtly out of sync with each other.  If a connection goes down between DC's, and you are taking data at both, how will you catch each other up?  C* already offers that resolution for you, and you'd have to work pretty hard to reproduce it for no obvious benefit that I can see.  
> 
> For minimum effort, definitely rely on Cassandra's well-tested codebase for this.
> 
> 
> 
> 
> On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
> <fs...@quantcast.com> wrote:
> > Each datacenter will have a cassandra cluster with a separate set of seeds
> > specific to that datacenter. However, the cluster name will be the same.
> >
> > Question 1: is this enough to guarentee that the three datacenters will have
> > distinct cassandra clusters as well? Or will one node in datacenter A still
> > somehow be able to join datacenter B's ring.
> 
> If they have network connectivity and the same cluster name, they are
> the same logical cluster. However if your nodes share tokens and you
> have auto_bootstrap=yes (the implicit default) the second node you
> attempt to start will refuse to start because you are trying to
> bootstrap it into the range of a live node.
> 
> > For now, we are planning on using our own relay mechanism to transfer
> > data changes from one datacenter to another.
> 
> Are you planning to use the streaming commitlog functionality for
> this? Not sure how you would capture all changes otherwise, except
> having your app just write the same thing to multiple places? Unless
> data timestamps are identical between clusters, otherwise identical
> data will not merge properly, as cassandra uses data timestamps to
> merge.
> 
> > Question 2: is this a sane strategy?
> 
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
> 
> > Question 3: eventually, we want to turn all these cassandra clusters into one
> > large multi-datacenter cluster. What's the best practice to do this? Should I
> > just add nodes from all datacenters to the list of seeds and let cassandra
> > resolve differences? Is there another way I don't know about?
> 
> If you are using NetworkTopologyStrategy and have the same cluster
> name for your isolated clusters, all you need to do is :
> 
> 1) configure NTS to store replicas on a per-datacenter basis
> 2) ensure that your nodes are in different logical data centers (by
> default, all nodes are in DC1/rack1)
> 3) ensure that clusters are able to reach each other
> 4) ensure that tokens do not overlap between clusters (the common
> technique with manual token assignment is that each node gets a range
> which is off-by-one)
> 5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
> 6) rolling restart (so the new seed list is picked up)
> 7) repair ("should" only be required if writes have not replicated via
> your out of band mechanism)
> 
> Vnodes change the picture slightly because the chance of your clusters
> having conflicting tokens increases with the number of token ranges
> you have.
> 
> =Rob
>

Re: Joining distinct clusters with the same schema together

Posted by Eric Stevens <mi...@gmail.com>.

>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?


Doing replication manually sounds like a recipe for the DC's eventually
getting subtly out of sync with each other.  If a connection goes down
between DC's, and you are taking data at both, how will you catch each
other up?  C* already offers that resolution for you, and you'd have to
work pretty hard to reproduce it for no obvious benefit that I can see.

For minimum effort, definitely rely on Cassandra's well-tested codebase for
this.




On Wed, Jun 19, 2013 at 2:27 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
> <fs...@quantcast.com> wrote:
> > Each datacenter will have a cassandra cluster with a separate set of
> seeds
> > specific to that datacenter. However, the cluster name will be the same.
> >
> > Question 1: is this enough to guarentee that the three datacenters will
> have
> > distinct cassandra clusters as well? Or will one node in datacenter A
> still
> > somehow be able to join datacenter B's ring.
>
> If they have network connectivity and the same cluster name, they are
> the same logical cluster. However if your nodes share tokens and you
> have auto_bootstrap=yes (the implicit default) the second node you
> attempt to start will refuse to start because you are trying to
> bootstrap it into the range of a live node.
>
> > For now, we are planning on using our own relay mechanism to transfer
> > data changes from one datacenter to another.
>
> Are you planning to use the streaming commitlog functionality for
> this? Not sure how you would capture all changes otherwise, except
> having your app just write the same thing to multiple places? Unless
> data timestamps are identical between clusters, otherwise identical
> data will not merge properly, as cassandra uses data timestamps to
> merge.
>
> > Question 2: is this a sane strategy?
>
> On its face my answer is "not... really"? What do you view yourself as
> getting with this technique versus using built in replication? As an
> example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
> consistency level operations?
>
> > Question 3: eventually, we want to turn all these cassandra clusters
> into one
> > large multi-datacenter cluster. What's the best practice to do this?
> Should I
> > just add nodes from all datacenters to the list of seeds and let
> cassandra
> > resolve differences? Is there another way I don't know about?
>
> If you are using NetworkTopologyStrategy and have the same cluster
> name for your isolated clusters, all you need to do is :
>
> 1) configure NTS to store replicas on a per-datacenter basis
> 2) ensure that your nodes are in different logical data centers (by
> default, all nodes are in DC1/rack1)
> 3) ensure that clusters are able to reach each other
> 4) ensure that tokens do not overlap between clusters (the common
> technique with manual token assignment is that each node gets a range
> which is off-by-one)
> 5) ensure that all nodes seed lists contain (recommended) 3 seeds from
> each DC
> 6) rolling restart (so the new seed list is picked up)
> 7) repair ("should" only be required if writes have not replicated via
> your out of band mechanism)
>
> Vnodes change the picture slightly because the chance of your clusters
> having conflicting tokens increases with the number of token ranges
> you have.
>
> =Rob
>

Re: Joining distinct clusters with the same schema together

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala
<fs...@quantcast.com> wrote:
> Each datacenter will have a cassandra cluster with a separate set of seeds
> specific to that datacenter. However, the cluster name will be the same.
>
> Question 1: is this enough to guarentee that the three datacenters will have
> distinct cassandra clusters as well? Or will one node in datacenter A still
> somehow be able to join datacenter B's ring.

If they have network connectivity and the same cluster name, they are
the same logical cluster. However if your nodes share tokens and you
have auto_bootstrap=yes (the implicit default) the second node you
attempt to start will refuse to start because you are trying to
bootstrap it into the range of a live node.

> For now, we are planning on using our own relay mechanism to transfer
> data changes from one datacenter to another.

Are you planning to use the streaming commitlog functionality for
this? Not sure how you would capture all changes otherwise, except
having your app just write the same thing to multiple places? Unless
data timestamps are identical between clusters, otherwise identical
data will not merge properly, as cassandra uses data timestamps to
merge.

> Question 2: is this a sane strategy?

On its face my answer is "not... really"? What do you view yourself as
getting with this technique versus using built in replication? As an
example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM
consistency level operations?

> Question 3: eventually, we want to turn all these cassandra clusters into one
> large multi-datacenter cluster. What's the best practice to do this? Should I
> just add nodes from all datacenters to the list of seeds and let cassandra
> resolve differences? Is there another way I don't know about?

If you are using NetworkTopologyStrategy and have the same cluster
name for your isolated clusters, all you need to do is :

1) configure NTS to store replicas on a per-datacenter basis
2) ensure that your nodes are in different logical data centers (by
default, all nodes are in DC1/rack1)
3) ensure that clusters are able to reach each other
4) ensure that tokens do not overlap between clusters (the common
technique with manual token assignment is that each node gets a range
which is off-by-one)
5) ensure that all nodes seed lists contain (recommended) 3 seeds from each DC
6) rolling restart (so the new seed list is picked up)
7) repair ("should" only be required if writes have not replicated via
your out of band mechanism)

Vnodes change the picture slightly because the chance of your clusters
having conflicting tokens increases with the number of token ranges
you have.

=Rob