You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by 張 睿 <ch...@cyberagent.co.jp> on 2012/10/30 12:18:11 UTC

Data migration between clusters

Hi,

We have several Cassandra clusters in our department, each for a single
application.
Now we're considering merge these clusters to a single one, and this
single one will serve
all applications using cassandra, each with a single keyspace.
We tried unifying all cluster names to a same one, but it didn't work
well since some
data got lost, maybe we did it in a wrong way.
Does anyone here know if there is an efficient way to migrate multiple
cassandra clusters' data
to a single cassandra cluster without any dataloss.

Thanks,
Ray

-- 
Ray Zhang
Cyberagent.co


Re: Data migration between clusters

Posted by 張 睿 <ch...@cyberagent.co.jp>.
Hi Rob,

Thank you for your reply.
Our scenario is like this, we have 3 clusters, each has 1 or 2 keyspaces 
in it,
and each cluster has 3 nodes.
Now we're considering integrating these 3 clusters of 9 nodes to a 
single cluster of 9 nodes.
This new cluster will contain all keyspaces and their data the former 3 
clusters have.
The replication factor, which is 3 now, will not be changed during this 
migration.
We tried using sstableloader which didn't work well. Maybe we did it in 
a wrong way.
It looks like the way of migrating data you suggested would solve our 
problem,
we'll try it out by refering the link you gave in your mail.

Thanks a lot again for your precious information,
Ray

(12/11/01 2:43), Rob Coli wrote:
> On Tue, Oct 30, 2012 at 4:18 AM, 張 睿 <ch...@cyberagent.co.jp> wrote:
>> Does anyone here know if there is an efficient way to migrate multiple
>> cassandra clusters' data
>> to a single cassandra cluster without any dataloss.
> Yes.
>
> 1) create schema which is superset of all columnfamilies and all keyspaces
> 2) if all source clusters were the same fixed number of nodes, create
> a new cluster with the same fixed number of nodes
> 3) nodetool drain and shut down all nodes on all participating clusters
> 4) copy sstables from old clusters, maintaining that data from source
> node [x] ends up on target node [x]
> 5) start cassandra
>
> However without more details as to your old clusters, new clusters,
> and availability requirements, I can't give you a more useful answer.
>
> Here's some background on bulk loading, including "copy-the-sstables."
>
> http://palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>
> =Rob
>

-- Ray Zhang Cyberagent.co


Re: Data migration between clusters

Posted by Rob Coli <rc...@palominodb.com>.
On Tue, Oct 30, 2012 at 4:18 AM, 張 睿 <ch...@cyberagent.co.jp> wrote:
> Does anyone here know if there is an efficient way to migrate multiple
> cassandra clusters' data
> to a single cassandra cluster without any dataloss.

Yes.

1) create schema which is superset of all columnfamilies and all keyspaces
2) if all source clusters were the same fixed number of nodes, create
a new cluster with the same fixed number of nodes
3) nodetool drain and shut down all nodes on all participating clusters
4) copy sstables from old clusters, maintaining that data from source
node [x] ends up on target node [x]
5) start cassandra

However without more details as to your old clusters, new clusters,
and availability requirements, I can't give you a more useful answer.

Here's some background on bulk loading, including "copy-the-sstables."

http://palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob

-- 
=Robert Coli
AIM&GTALK - rcoli@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb