You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anishek Agarwal <an...@gmail.com> on 2016/03/14 07:10:29 UTC

Multi DC setup for analytics

Hello,

We are using cassandra 2.0.17 and have two logical DC having different
Keyspaces but both having same logical name DC1.

we want to setup another cassandra cluster for analytics which should get
data from both the above DC.

if we setup the new DC with name DC2 and follow the steps
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
will it work ?

I would think we would have to first change the names of existing clusters
to have to different names and then go with adding another dc getting data
from these?

Also as soon as we add the node the data starts moving... this will all be
only real time changes done to the cluster right ? we still have to do the
rebuild to get the data for tokens for node in new cluster ?

Thanks
Anishek

Re: Multi DC setup for analytics

Posted by Laszlo Jobs <la...@gmail.com>.
Anishek,

AFAIK you can not have clusters "overlap" each oder.

Just an idea: Try to address it as an sstable restore.
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html

What I would try to do (not tested!):

- create a logical DC in each cluster (CLUSTER_1 and CLUSTER_2), with
limited number of nodes so you do not need to backup a lot of nodes. lets
call it DC_AR (for Analytics Replica)

- alter replication factor of keyspaces and tables in CLUSTER_1 and
CLUSTER_2 clusters to use their own DC_AR DC, store at least 1 replica
there too (this is a minimal change to CLUSTER_1 and CLUSTER_2)

- when restoring to CLUSTER_3 create snapshots in both DC_AR on each
cluster CLUSTER_1 and CLUSTER_2

- follow the restore procedure described in the link above and restore
sstables to the analytics cluster CLUSTER_3

- create a logical DC (DA_A) in each Cluster, CLUSTER_1 and CLUSTER_2, with
one or more nodes according to your

- if you need more power on the CLUSTER_3 then you can add more nodes after
the restore and repair (could be time consuming)

You might tune the process above as this is just a high level idea.
You need to consider the following thing among others potentially 9i sure
this is not a complete list below):
- maintain the schema in the CLUSTER_3 keyspaces whenever they change on
CLUSTER_1 or CLUSTER_2
- can not use the same keyspace names on CLUSTER_1 and CLUSTER_2
- replication factor for the DC_AR DCs in both clusters CLUSTER_1 and
CLUSTER_2
- what consistency level you use in your application, QUORUM might hurt you
but LOCAL_QUORUM could be OK.
- ensure that clients are not connecting to DC_AR nodes (not a hard
requirement)

If this works, then you do not have to rebuild the clusters you have today
(CLUSTER_1 and CLUSTER_2).

P.S. I am relatively new to Cassandra and using only 3.x versions (using =
playing and learning).

Regards,

Laszlo


On Wed, Mar 30, 2016 at 8:43 AM, Anishek Agarwal <an...@gmail.com> wrote:

> Hey Guys,
>
> We did the necessary changes and were trying to get this back on track,
> but hit another wall,
>
> we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
> CLUSTER_1, CLUSTER_2)
>
> we want to have a common analytics cluster in DC3 with cluster name
> (CLUSTER_3). -- looks like this can't be done, so we have to setup two
> different analytics cluster ? can't we just get data from CLUSTER_1/2 to
> same cluster CLUSTER_3 ?
>
> thanks
> anishek
>
> On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> Hey Clint,
>>
>> we have two separate rings which don't talk to each other but both having
>> the same DC name "DCX".
>>
>> @Raja,
>>
>> We had already gone towards the path you suggested.
>>
>> thanks all
>> anishek
>>
>> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <ar...@gmail.com> wrote:
>>
>>> Yes. Here are the steps.
>>> You will have to change the DC Names first.
>>> DC1 and DC2 would be independent clusters.
>>>
>>> Create a new DC, DC3 and include these two DC's on DC3.
>>>
>>> This should work well.
>>>
>>>
>>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>>> clintlmartin@coolfiretechnologies.com> wrote:
>>>
>>>> When you say you have two logical DC both with the same name are you
>>>> saying that you have two clusters of servers both with the same DC name,
>>>> nether of which currently talk to each other? IE they are two separate
>>>> rings?
>>>>
>>>> Or do you mean that you have two keyspaces in one cluster?
>>>>
>>>> Or?
>>>>
>>>> Clint
>>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We are using cassandra 2.0.17 and have two logical DC having different
>>>>> Keyspaces but both having same logical name DC1.
>>>>>
>>>>> we want to setup another cassandra cluster for analytics which should
>>>>> get data from both the above DC.
>>>>>
>>>>> if we setup the new DC with name DC2 and follow the steps
>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>>> will it work ?
>>>>>
>>>>> I would think we would have to first change the names of existing
>>>>> clusters to have to different names and then go with adding another dc
>>>>> getting data from these?
>>>>>
>>>>> Also as soon as we add the node the data starts moving... this will
>>>>> all be only real time changes done to the cluster right ? we still have to
>>>>> do the rebuild to get the data for tokens for node in new cluster ?
>>>>>
>>>>> Thanks
>>>>> Anishek
>>>>>
>>>>
>>>
>>>
>>> --
>>> "In this world, you either have an excuse or a story. I preferred to
>>> have a story"
>>>
>>
>>
>

Re: Multi DC setup for analytics

Posted by Anishek Agarwal <an...@gmail.com>.
Hey Bryan,

Thanks for the info, we inferred as much, currently the only other thing we
were trying were trying to start two separate instances in Analytics
cluster on same set of machines to talk to respective individual DC's but
within 2 mins dropped that as we will have to change ports on atlas one of
the existing DC's so when they join with the analytics cluster they are on
same port.

for now we are just getting another set of machines for this.


I had known about the pattern of using a separate analytics cluster for
cassandra but thought we could join them across two clusters, my bad now
that i think of it i think it would have been better to have just one DC
for realtime prod requests instead of two.

are there ways of merging existing clusters to one cluster in cassandra ?


On Fri, Apr 1, 2016 at 5:05 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> I'm jumping into this thread late, so sorry if this has been covered
> before. But am I correct in reading that you have two different Cassandra
> rings, not talking to each other at all, and you want to have a shared DC
> with a third Cassandra ring?
>
> I'm not sure what you want to do is possible.
>
> If I had the luxury of starting from scratch, the design I would do is:
> All three DC's in one cluster, with 3 datacenters. DC3 is the analytics DC.
> DC1's keyspaces are replicated to DC1 and DC3 only.
> DC2's keyspaces are replicated to DC2 and DC3 only.
>
> Then you have DC3 with all data from both DC1 and DC2 to run analytics on,
> and no cross-talk between DC1 and DC2.
>
> If you cannot rebuild your existing clusters, you may want to consider
> using something like Spark to ETL your data out of DC1 and DC2 into a new
> cluster at DC3. At that point you're running a data warehouse and lose some
> of the advantages of seemless cluster membership.
>
> On Wed, Mar 30, 2016 at 5:43 AM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> Hey Guys,
>>
>> We did the necessary changes and were trying to get this back on track,
>> but hit another wall,
>>
>> we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
>> CLUSTER_1, CLUSTER_2)
>>
>> we want to have a common analytics cluster in DC3 with cluster name
>> (CLUSTER_3). -- looks like this can't be done, so we have to setup two
>> different analytics cluster ? can't we just get data from CLUSTER_1/2 to
>> same cluster CLUSTER_3 ?
>>
>> thanks
>> anishek
>>
>> On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <an...@gmail.com>
>> wrote:
>>
>>> Hey Clint,
>>>
>>> we have two separate rings which don't talk to each other but both
>>> having the same DC name "DCX".
>>>
>>> @Raja,
>>>
>>> We had already gone towards the path you suggested.
>>>
>>> thanks all
>>> anishek
>>>
>>> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <ar...@gmail.com>
>>> wrote:
>>>
>>>> Yes. Here are the steps.
>>>> You will have to change the DC Names first.
>>>> DC1 and DC2 would be independent clusters.
>>>>
>>>> Create a new DC, DC3 and include these two DC's on DC3.
>>>>
>>>> This should work well.
>>>>
>>>>
>>>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>>>> clintlmartin@coolfiretechnologies.com> wrote:
>>>>
>>>>> When you say you have two logical DC both with the same name are you
>>>>> saying that you have two clusters of servers both with the same DC name,
>>>>> nether of which currently talk to each other? IE they are two separate
>>>>> rings?
>>>>>
>>>>> Or do you mean that you have two keyspaces in one cluster?
>>>>>
>>>>> Or?
>>>>>
>>>>> Clint
>>>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We are using cassandra 2.0.17 and have two logical DC having
>>>>>> different Keyspaces but both having same logical name DC1.
>>>>>>
>>>>>> we want to setup another cassandra cluster for analytics which should
>>>>>> get data from both the above DC.
>>>>>>
>>>>>> if we setup the new DC with name DC2 and follow the steps
>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>>>> will it work ?
>>>>>>
>>>>>> I would think we would have to first change the names of existing
>>>>>> clusters to have to different names and then go with adding another dc
>>>>>> getting data from these?
>>>>>>
>>>>>> Also as soon as we add the node the data starts moving... this will
>>>>>> all be only real time changes done to the cluster right ? we still have to
>>>>>> do the rebuild to get the data for tokens for node in new cluster ?
>>>>>>
>>>>>> Thanks
>>>>>> Anishek
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> "In this world, you either have an excuse or a story. I preferred to
>>>> have a story"
>>>>
>>>
>>>
>>
>

Re: Multi DC setup for analytics

Posted by Bryan Cheng <br...@blockcypher.com>.
I'm jumping into this thread late, so sorry if this has been covered
before. But am I correct in reading that you have two different Cassandra
rings, not talking to each other at all, and you want to have a shared DC
with a third Cassandra ring?

I'm not sure what you want to do is possible.

If I had the luxury of starting from scratch, the design I would do is:
All three DC's in one cluster, with 3 datacenters. DC3 is the analytics DC.
DC1's keyspaces are replicated to DC1 and DC3 only.
DC2's keyspaces are replicated to DC2 and DC3 only.

Then you have DC3 with all data from both DC1 and DC2 to run analytics on,
and no cross-talk between DC1 and DC2.

If you cannot rebuild your existing clusters, you may want to consider
using something like Spark to ETL your data out of DC1 and DC2 into a new
cluster at DC3. At that point you're running a data warehouse and lose some
of the advantages of seemless cluster membership.

On Wed, Mar 30, 2016 at 5:43 AM, Anishek Agarwal <an...@gmail.com> wrote:

> Hey Guys,
>
> We did the necessary changes and were trying to get this back on track,
> but hit another wall,
>
> we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
> CLUSTER_1, CLUSTER_2)
>
> we want to have a common analytics cluster in DC3 with cluster name
> (CLUSTER_3). -- looks like this can't be done, so we have to setup two
> different analytics cluster ? can't we just get data from CLUSTER_1/2 to
> same cluster CLUSTER_3 ?
>
> thanks
> anishek
>
> On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <an...@gmail.com>
> wrote:
>
>> Hey Clint,
>>
>> we have two separate rings which don't talk to each other but both having
>> the same DC name "DCX".
>>
>> @Raja,
>>
>> We had already gone towards the path you suggested.
>>
>> thanks all
>> anishek
>>
>> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <ar...@gmail.com> wrote:
>>
>>> Yes. Here are the steps.
>>> You will have to change the DC Names first.
>>> DC1 and DC2 would be independent clusters.
>>>
>>> Create a new DC, DC3 and include these two DC's on DC3.
>>>
>>> This should work well.
>>>
>>>
>>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>>> clintlmartin@coolfiretechnologies.com> wrote:
>>>
>>>> When you say you have two logical DC both with the same name are you
>>>> saying that you have two clusters of servers both with the same DC name,
>>>> nether of which currently talk to each other? IE they are two separate
>>>> rings?
>>>>
>>>> Or do you mean that you have two keyspaces in one cluster?
>>>>
>>>> Or?
>>>>
>>>> Clint
>>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We are using cassandra 2.0.17 and have two logical DC having different
>>>>> Keyspaces but both having same logical name DC1.
>>>>>
>>>>> we want to setup another cassandra cluster for analytics which should
>>>>> get data from both the above DC.
>>>>>
>>>>> if we setup the new DC with name DC2 and follow the steps
>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>>> will it work ?
>>>>>
>>>>> I would think we would have to first change the names of existing
>>>>> clusters to have to different names and then go with adding another dc
>>>>> getting data from these?
>>>>>
>>>>> Also as soon as we add the node the data starts moving... this will
>>>>> all be only real time changes done to the cluster right ? we still have to
>>>>> do the rebuild to get the data for tokens for node in new cluster ?
>>>>>
>>>>> Thanks
>>>>> Anishek
>>>>>
>>>>
>>>
>>>
>>> --
>>> "In this world, you either have an excuse or a story. I preferred to
>>> have a story"
>>>
>>
>>
>

Re: Multi DC setup for analytics

Posted by Anishek Agarwal <an...@gmail.com>.
Hey Guys,

We did the necessary changes and were trying to get this back on track, but
hit another wall,

we have two Clusters in Different DC ( DC1 and DC2) with cluster names (
CLUSTER_1, CLUSTER_2)

we want to have a common analytics cluster in DC3 with cluster name
(CLUSTER_3). -- looks like this can't be done, so we have to setup two
different analytics cluster ? can't we just get data from CLUSTER_1/2 to
same cluster CLUSTER_3 ?

thanks
anishek

On Mon, Mar 21, 2016 at 3:31 PM, Anishek Agarwal <an...@gmail.com> wrote:

> Hey Clint,
>
> we have two separate rings which don't talk to each other but both having
> the same DC name "DCX".
>
> @Raja,
>
> We had already gone towards the path you suggested.
>
> thanks all
> anishek
>
> On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <ar...@gmail.com> wrote:
>
>> Yes. Here are the steps.
>> You will have to change the DC Names first.
>> DC1 and DC2 would be independent clusters.
>>
>> Create a new DC, DC3 and include these two DC's on DC3.
>>
>> This should work well.
>>
>>
>> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
>> clintlmartin@coolfiretechnologies.com> wrote:
>>
>>> When you say you have two logical DC both with the same name are you
>>> saying that you have two clusters of servers both with the same DC name,
>>> nether of which currently talk to each other? IE they are two separate
>>> rings?
>>>
>>> Or do you mean that you have two keyspaces in one cluster?
>>>
>>> Or?
>>>
>>> Clint
>>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We are using cassandra 2.0.17 and have two logical DC having different
>>>> Keyspaces but both having same logical name DC1.
>>>>
>>>> we want to setup another cassandra cluster for analytics which should
>>>> get data from both the above DC.
>>>>
>>>> if we setup the new DC with name DC2 and follow the steps
>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>>> will it work ?
>>>>
>>>> I would think we would have to first change the names of existing
>>>> clusters to have to different names and then go with adding another dc
>>>> getting data from these?
>>>>
>>>> Also as soon as we add the node the data starts moving... this will all
>>>> be only real time changes done to the cluster right ? we still have to do
>>>> the rebuild to get the data for tokens for node in new cluster ?
>>>>
>>>> Thanks
>>>> Anishek
>>>>
>>>
>>
>>
>> --
>> "In this world, you either have an excuse or a story. I preferred to have
>> a story"
>>
>
>

Re: Multi DC setup for analytics

Posted by Anishek Agarwal <an...@gmail.com>.
Hey Clint,

we have two separate rings which don't talk to each other but both having
the same DC name "DCX".

@Raja,

We had already gone towards the path you suggested.

thanks all
anishek

On Fri, Mar 18, 2016 at 8:01 AM, Reddy Raja <ar...@gmail.com> wrote:

> Yes. Here are the steps.
> You will have to change the DC Names first.
> DC1 and DC2 would be independent clusters.
>
> Create a new DC, DC3 and include these two DC's on DC3.
>
> This should work well.
>
>
> On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
> clintlmartin@coolfiretechnologies.com> wrote:
>
>> When you say you have two logical DC both with the same name are you
>> saying that you have two clusters of servers both with the same DC name,
>> nether of which currently talk to each other? IE they are two separate
>> rings?
>>
>> Or do you mean that you have two keyspaces in one cluster?
>>
>> Or?
>>
>> Clint
>> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> We are using cassandra 2.0.17 and have two logical DC having different
>>> Keyspaces but both having same logical name DC1.
>>>
>>> we want to setup another cassandra cluster for analytics which should
>>> get data from both the above DC.
>>>
>>> if we setup the new DC with name DC2 and follow the steps
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>> will it work ?
>>>
>>> I would think we would have to first change the names of existing
>>> clusters to have to different names and then go with adding another dc
>>> getting data from these?
>>>
>>> Also as soon as we add the node the data starts moving... this will all
>>> be only real time changes done to the cluster right ? we still have to do
>>> the rebuild to get the data for tokens for node in new cluster ?
>>>
>>> Thanks
>>> Anishek
>>>
>>
>
>
> --
> "In this world, you either have an excuse or a story. I preferred to have
> a story"
>

Re: Multi DC setup for analytics

Posted by Reddy Raja <ar...@gmail.com>.
Yes. Here are the steps.
You will have to change the DC Names first.
DC1 and DC2 would be independent clusters.

Create a new DC, DC3 and include these two DC's on DC3.

This should work well.


On Thu, Mar 17, 2016 at 11:03 PM, Clint Martin <
clintlmartin@coolfiretechnologies.com> wrote:

> When you say you have two logical DC both with the same name are you
> saying that you have two clusters of servers both with the same DC name,
> nether of which currently talk to each other? IE they are two separate
> rings?
>
> Or do you mean that you have two keyspaces in one cluster?
>
> Or?
>
> Clint
> On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:
>
>> Hello,
>>
>> We are using cassandra 2.0.17 and have two logical DC having different
>> Keyspaces but both having same logical name DC1.
>>
>> we want to setup another cassandra cluster for analytics which should get
>> data from both the above DC.
>>
>> if we setup the new DC with name DC2 and follow the steps
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>> will it work ?
>>
>> I would think we would have to first change the names of existing
>> clusters to have to different names and then go with adding another dc
>> getting data from these?
>>
>> Also as soon as we add the node the data starts moving... this will all
>> be only real time changes done to the cluster right ? we still have to do
>> the rebuild to get the data for tokens for node in new cluster ?
>>
>> Thanks
>> Anishek
>>
>


-- 
"In this world, you either have an excuse or a story. I preferred to have a
story"

Re: Multi DC setup for analytics

Posted by Clint Martin <cl...@coolfiretechnologies.com>.
When you say you have two logical DC both with the same name are you saying
that you have two clusters of servers both with the same DC name, nether of
which currently talk to each other? IE they are two separate rings?

Or do you mean that you have two keyspaces in one cluster?

Or?

Clint
On Mar 14, 2016 2:11 AM, "Anishek Agarwal" <an...@gmail.com> wrote:

> Hello,
>
> We are using cassandra 2.0.17 and have two logical DC having different
> Keyspaces but both having same logical name DC1.
>
> we want to setup another cassandra cluster for analytics which should get
> data from both the above DC.
>
> if we setup the new DC with name DC2 and follow the steps
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
> will it work ?
>
> I would think we would have to first change the names of existing clusters
> to have to different names and then go with adding another dc getting data
> from these?
>
> Also as soon as we add the node the data starts moving... this will all be
> only real time changes done to the cluster right ? we still have to do the
> rebuild to get the data for tokens for node in new cluster ?
>
> Thanks
> Anishek
>