You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Spico Florin <sp...@gmail.com> on 2014/11/25 16:09:23 UTC

Data synchronization between 2 running clusters on different availability zone

Hello!
   I have the following scenario:
1. For ensuring high availability I would like to install one Cassandra
cluster on one availability zone
(on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2
US-west).
2.I have pipeline that is running on Amazon EC2-EAST and is feeding the
Cassandra installed on this AZ.
Here are my questions:
1. Is this scenario feasible?
 2. Is the architecture correct regarding the availability of Cassandra?
3. If the architecture is fine, how do you keep data synchronized between
the two instances?

I look forward for your answers.
 Regards,
  Florin

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Jeremy Jongsma <je...@barchart.com>.
Here's a snitch we use for this situation - it uses a property file if it
exists, but falls back to EC2 autodiscovery if it is missing.

https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java

On Mon, Dec 1, 2014 at 12:33 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Nov 27, 2014 at 1:24 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>>   I have another question. What about the following scenario: two
>> Cassandra instances installed on different cloud providers (EC2, Flexiant)?
>> How do you synchronize them? Can you use some internal tools or do I have
>> to implement my own mechanism?
>>
>
> That's what I meant by "if maybe hybrid in the future, use GPFS" :
>
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>
> "hybrid" in this case means AWS-and-not-AWS.
>
> =Rob
>
>

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Nov 27, 2014 at 1:24 AM, Spico Florin <sp...@gmail.com> wrote:

>   I have another question. What about the following scenario: two
> Cassandra instances installed on different cloud providers (EC2, Flexiant)?
> How do you synchronize them? Can you use some internal tools or do I have
> to implement my own mechanism?
>

That's what I meant by "if maybe hybrid in the future, use GPFS" :

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html

"hybrid" in this case means AWS-and-not-AWS.

=Rob

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Eric Stevens <mi...@gmail.com>.
There's no reason you can't run on multiple cloud providers as long as you
treat them as logically distinct DC's.  It should largely work the same way
as running in several AWS regions, but you'll need to use something
like GossipingPropertyFileSnitch
because the EC2 snitches are specific to AWS.

On Thu Nov 27 2014 at 2:26:27 AM Spico Florin <sp...@gmail.com> wrote:

> Hello!
>   I have another question. What about the following scenario: two
> Cassandra instances installed on different cloud providers (EC2, Flexiant)?
> How do you synchronize them? Can you use some internal tools or do I have
> to implement my own mechanism?
> Thanks.
>  Florin
>
>
> On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>> Hello, Rob!
>>   Thank you very much for the detailed support.
>> Regards,
>>  Florin
>>
>> On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin <sp...@gmail.com>
>>> wrote:
>>>
>>>> 1. For ensuring high availability I would like to install one Cassandra
>>>> cluster on one availability zone
>>>> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon
>>>> EC2 US-west).
>>>>
>>>
>>> One cluster, replication factor of 2, cluster configured with a rack
>>> aware snitch is how this is usually done. Well, more accurately, people
>>> usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is
>>> also required to use QUORUM Consistency Level.
>>>
>>> If you will always operate only out of EC2, you probably want to look
>>> into the EC2Snitch. If you plan to ultimately go multi-region,
>>> EC2MultiRegionSnitch. If maybe hybrid in the future,
>>> GossipingPropertyFileSnitch.
>>>
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>>>
>>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>>>
>>> For some good meta on the internals here :
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-3810
>>>
>>> =Rob
>>> http://twitter.com/rcolidba
>>>
>>>
>>
>>
>

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Spico Florin <sp...@gmail.com>.
Hello!
  I have another question. What about the following scenario: two Cassandra
instances installed on different cloud providers (EC2, Flexiant)? How do
you synchronize them? Can you use some internal tools or do I have to
implement my own mechanism?
Thanks.
 Florin


On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin <sp...@gmail.com>
wrote:

> Hello, Rob!
>   Thank you very much for the detailed support.
> Regards,
>  Florin
>
> On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin <sp...@gmail.com>
>> wrote:
>>
>>> 1. For ensuring high availability I would like to install one Cassandra
>>> cluster on one availability zone
>>> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon
>>> EC2 US-west).
>>>
>>
>> One cluster, replication factor of 2, cluster configured with a rack
>> aware snitch is how this is usually done. Well, more accurately, people
>> usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is
>> also required to use QUORUM Consistency Level.
>>
>> If you will always operate only out of EC2, you probably want to look
>> into the EC2Snitch. If you plan to ultimately go multi-region,
>> EC2MultiRegionSnitch. If maybe hybrid in the future,
>> GossipingPropertyFileSnitch.
>>
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>>
>> For some good meta on the internals here :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-3810
>>
>> =Rob
>> http://twitter.com/rcolidba
>>
>>
>
>

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Spico Florin <sp...@gmail.com>.
Hello, Rob!
  Thank you very much for the detailed support.
Regards,
 Florin

On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>> 1. For ensuring high availability I would like to install one Cassandra
>> cluster on one availability zone
>> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2
>> US-west).
>>
>
> One cluster, replication factor of 2, cluster configured with a rack aware
> snitch is how this is usually done. Well, more accurately, people usually
> deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also
> required to use QUORUM Consistency Level.
>
> If you will always operate only out of EC2, you probably want to look into
> the EC2Snitch. If you plan to ultimately go multi-region,
> EC2MultiRegionSnitch. If maybe hybrid in the future,
> GossipingPropertyFileSnitch.
>
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html
>
> For some good meta on the internals here :
>
> https://issues.apache.org/jira/browse/CASSANDRA-3810
>
> =Rob
> http://twitter.com/rcolidba
>
>

Re: Data synchronization between 2 running clusters on different availability zone

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin <sp...@gmail.com> wrote:

> 1. For ensuring high availability I would like to install one Cassandra
> cluster on one availability zone
> (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2
> US-west).
>

One cluster, replication factor of 2, cluster configured with a rack aware
snitch is how this is usually done. Well, more accurately, people usually
deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also
required to use QUORUM Consistency Level.

If you will always operate only out of EC2, you probably want to look into
the EC2Snitch. If you plan to ultimately go multi-region,
EC2MultiRegionSnitch. If maybe hybrid in the future,
GossipingPropertyFileSnitch.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html

For some good meta on the internals here :

https://issues.apache.org/jira/browse/CASSANDRA-3810

=Rob
http://twitter.com/rcolidba