You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Josep Blanquer <bl...@rightscale.com> on 2011/08/26 00:25:23 UTC

Live migrating data from 2 separate cassandra clusters

Hi,

 I am looking for an efficient way migrate a portion of the data existing in
a Cassandra cluster to another, separate Cassandra cluster. What I need is
to solve the typical live migration problem that appears in any "DB
sharding" where need to transfer "ownership" of certain rows from DB1 to
DB2...but in a way that clients see no (or almost no) disruption when you
actually do the cutover to DB2 for those writes.

I mean doing something as typical like:

loop (until almost no rows have been modified):
 rows = SELECT * from T where "criteria matches (i.e., shard_id=1) " AND
updated_at > last_time
 last_time = now
 insert(rows) elsewhere
end
...
"lock" modifications to original DB
do one last SELECT to get the last few modified rows
cutover the ownership - (change and ensure the clients know that the new
home for that data is in the other "DB")
unlock modifications


 So, anyway, I thought that I'd be able to apply the same principles by
passing a timestamp of sorts to the get_slices call so I could further
restrict getting only matching columns that have timestamps newer than the
one passed. Now, looking at the thrift interface I see that there is no
timestamp parameter at all...which makes me wonder how people are doing it,
and if there are any well-know practices for it. Setting up a full new
replicating DC within the same cluster doesn't work, as there are some clear
cases where you want to have completely separate cassandra rings.

Cheers,

 Josep M.