You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jean Tremblay <je...@zen-innovations.com> on 2016/01/27 15:49:18 UTC

Rename Keyspace offline

Hi,

I have a huge set of data, which takes about 2 days to bulk load on a Cassandra 3.0 cluster of 5 nodes. That is about 13 billion rows.

Quite often I need to reload this data, new structure, or data is reorganise. There are clients reading from a given keyspace (KS-X).

Since it takes me 2 days to load my data, I was planning to load the new set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to KS-X.

Now I know "renaming keyspace" is a functionality which was removed.

Would this procedure work to destroy an old keyspace KS-X and rename a new keyspace KS-Y to KS-X:

1) nodetool drain each node.
2) stop cassandra on each node.
3) on each node:
	3.1) rm -r data/KS-X
	3.2) mv data/KS-Y data/KS-X
4) restart each node.

Could someone please confirm this? I guess it would work, but I’m just afraid that there could be in some system table some information that would not allow this.

Thanks for your help.

Cheers

Jean

Re: Rename Keyspace offline

Posted by Jack Krupansky <ja...@gmail.com>.
If you are doing this full bulk reload a lot, it may make more sense to use
a separate cluster to bring up the new data and then atomically switch your
clients/apps to the IP address of the new cluster once you've validated it,
and then decommission and recyle the machines of the old cluster. This
would maximize performance of the production cluster and maximize
performance of the staging process as well. And you would need less
hardware for each node/cluster as well since you won't need to support two
copies of the data on a single node/cluster. It will also make it a lot
easier to upgrade the cluster without worry about impact on production
during the upgrade since the client/app would only ever see a fully
consistent cluster. (I lost count of how many wins this approach would give
you!)

-- Jack Krupansky

On Wed, Jan 27, 2016 at 10:53 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Why rename the keyspace? If it was me I'd just give it a name that
> includes the date or some identifier and include that logic in my app.
> That's way easier.
> On Wed, Jan 27, 2016 at 6:49 AM Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a huge set of data, which takes about 2 days to bulk load on a
>> Cassandra 3.0 cluster of 5 nodes. That is about 13 billion rows.
>>
>> Quite often I need to reload this data, new structure, or data is
>> reorganise. There are clients reading from a given keyspace (KS-X).
>>
>> Since it takes me 2 days to load my data, I was planning to load the new
>> set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to
>> KS-X.
>>
>> Now I know "renaming keyspace" is a functionality which was removed.
>>
>> Would this procedure work to destroy an old keyspace KS-X and rename a
>> new keyspace KS-Y to KS-X:
>>
>> 1) nodetool drain each node.
>> 2) stop cassandra on each node.
>> 3) on each node:
>>         3.1) rm -r data/KS-X
>>         3.2) mv data/KS-Y data/KS-X
>> 4) restart each node.
>>
>> Could someone please confirm this? I guess it would work, but I’m just
>> afraid that there could be in some system table some information that would
>> not allow this.
>>
>> Thanks for your help.
>>
>> Cheers
>>
>> Jean
>
>

Re: Rename Keyspace offline

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Why rename the keyspace? If it was me I'd just give it a name that includes
the date or some identifier and include that logic in my app. That's way
easier.
On Wed, Jan 27, 2016 at 6:49 AM Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Hi,
>
> I have a huge set of data, which takes about 2 days to bulk load on a
> Cassandra 3.0 cluster of 5 nodes. That is about 13 billion rows.
>
> Quite often I need to reload this data, new structure, or data is
> reorganise. There are clients reading from a given keyspace (KS-X).
>
> Since it takes me 2 days to load my data, I was planning to load the new
> set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to
> KS-X.
>
> Now I know "renaming keyspace" is a functionality which was removed.
>
> Would this procedure work to destroy an old keyspace KS-X and rename a new
> keyspace KS-Y to KS-X:
>
> 1) nodetool drain each node.
> 2) stop cassandra on each node.
> 3) on each node:
>         3.1) rm -r data/KS-X
>         3.2) mv data/KS-Y data/KS-X
> 4) restart each node.
>
> Could someone please confirm this? I guess it would work, but I’m just
> afraid that there could be in some system table some information that would
> not allow this.
>
> Thanks for your help.
>
> Cheers
>
> Jean

Re: Rename Keyspace offline

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you all for your replies.
My main objective was not to change my client.
After your answers it makes a lot of sense to modify my client in a way to make it accept different key space name. This way I will no longer need to rename a key space I simply need to develop a way to tell my client that there is a new key space.

Thanks again for your feedback
Jean

On 27 Jan,2016, at 19:58, Robert Coli <rc...@eventbrite.com>> wrote:

On Wed, Jan 27, 2016 at 6:49 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Since it takes me 2 days to load my data, I was planning to load the new set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to KS-X.

Why bother with the rename? Just have two keyspaces, foo and foo_, and alternate your bulk loads between truncating them?

Would this procedure work to destroy an old keyspace KS-X and rename a new keyspace KS-Y to KS-X:

Yes, if you include :

0) Load schema for KS-Y into KS-X

1) nodetool drain each node.
2) stop cassandra on each node.
3) on each node:
        3.1) rm -r data/KS-X
        3.2) mv data/KS-Y data/KS-X
4) restart each node.

Note also that in step 3.2, the uuid component of file and/or directory names will have to be changed.

=Rob


Re: Rename Keyspace offline

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jan 27, 2016 at 6:49 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Since it takes me 2 days to load my data, I was planning to load the new
> set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to
> KS-X.
>

Why bother with the rename? Just have two keyspaces, foo and foo_, and
alternate your bulk loads between truncating them?


> Would this procedure work to destroy an old keyspace KS-X and rename a new
> keyspace KS-Y to KS-X:
>

Yes, if you include :

0) Load schema for KS-Y into KS-X

1) nodetool drain each node.
> 2) stop cassandra on each node.
> 3) on each node:
>         3.1) rm -r data/KS-X
>         3.2) mv data/KS-Y data/KS-X
> 4) restart each node.
>

Note also that in step 3.2, the uuid component of file and/or directory
names will have to be changed.

=Rob

Re: Rename Keyspace offline

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
>
>  3.1) rm -r data/KS-X
>  3.2) mv data/KS-Y data/KS-X


This won't work, sstable names contains keyspace name.

I had this issue too (wanted to split a keyspace into multiple ones, use
this occasion to rename tables, etc

I finally ended up writing a small python script there :
https://github.com/arodrime/cassandra-tools/blob/master/operations/move_table.py.
This was allowing me to mv any ks.cf to ks.cf2, ks2.cf2, or ks2.cf.

I used it in my previous job prod and it worked like a charm, we were
really happy with this, yet I won't assume any responsibility. Just hope it
will be useful to you.

One last warning, it was written to be compatible with my environment, you
might adjust a few things or improve the code to have anything you need as
an option. Feel free to do whatever you want with this code.

Anyway, you have the logic in there at least.

C*heers,

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com

2016-01-27 15:49 GMT+01:00 Jean Tremblay <je...@zen-innovations.com>
:

> Hi,
>
> I have a huge set of data, which takes about 2 days to bulk load on a
> Cassandra 3.0 cluster of 5 nodes. That is about 13 billion rows.
>
> Quite often I need to reload this data, new structure, or data is
> reorganise. There are clients reading from a given keyspace (KS-X).
>
> Since it takes me 2 days to load my data, I was planning to load the new
> set on a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to
> KS-X.
>
> Now I know "renaming keyspace" is a functionality which was removed.
>
> Would this procedure work to destroy an old keyspace KS-X and rename a new
> keyspace KS-Y to KS-X:
>
> 1) nodetool drain each node.
> 2) stop cassandra on each node.
> 3) on each node:
>         3.1) rm -r data/KS-X
>         3.2) mv data/KS-Y data/KS-X
> 4) restart each node.
>
> Could someone please confirm this? I guess it would work, but I’m just
> afraid that there could be in some system table some information that would
> not allow this.
>
> Thanks for your help.
>
> Cheers
>
> Jean