You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yang <te...@gmail.com> on 2011/08/02 00:08:16 UTC

implications of using more keyspaces vs single keyspace?

for example my data consists of "salary", "office stationery list",

let's say I do use the same replicationStrategy for  them, these 2
data sets have
different key ranges, key distributions,

then is it better to use separate keyspaces for each of them? or use a
single one?

the factors I can think of:
separate: have to call set_keyspace() if your calls switch between datasets
                potential to change to different replication factor in
the future

any thoughts?

Thanks a lot
Yang

Re: implications of using more keyspaces vs single keyspace?

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 1, 2011 at 6:08 PM, Yang <te...@gmail.com> wrote:

> for example my data consists of "salary", "office stationery list",
>
> let's say I do use the same replicationStrategy for  them, these 2
> data sets have
> different key ranges, key distributions,
>
> then is it better to use separate keyspaces for each of them? or use a
> single one?
>
> the factors I can think of:
> separate: have to call set_keyspace() if your calls switch between datasets
>                potential to change to different replication factor in
> the future
>
> any thoughts?
>
> Thanks a lot
> Yang
>

Ah interesting question.

In the old days operations a operations like get() took keyspace as the
first string argument. Now changing keyspace requires running
setKeyspace(String) which is an extra RPC operation. If you want to interact
with two keyspaces you either need to keep two connection pools open, or you
have to use an RPC call every time you want to change keyspaces. While the
smaller signature for the get() is nice having the extra RPC call is not
good.

However as you mentioned you can only apply different replication factors on
the keyspace level. That is nice especially if you find one column family is
not as important as another. Since a keyspace is a folder you can also mount
a keyspace on a different physical device.

I still like one column family per keyspace, but having N connection pools
for N keyspaces complicates things.