You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Coe, Robin" <ro...@bluecoat.com> on 2009/12/03 21:54:25 UTC

data modeling question

I am evaluating Cassandra and playing with some data model ideas.  One
model I'm thinking about is to take privacy concerns into account, such
that interleaving tenant data in the same column family is not allowed.

Another reason I can think to keep the data for different tenants in
separate CFs is to simplify the data model, such that I can avoid using
SuperColumns to categorize the data.  I may have lots of columns and I
don't want to deserialize all of them when I have to drill into a super
column.

I expect one solution is to separate each tenant into their own key
space and use separate connection pools for each.  I expect another
solution would be to create one column family per tenant.

Since adding/removing CFs requires editing the storage-conf file and
restarting Cassandra to pick up the change, I'm wondering if there's a
better way to implement this model?  Alternatively, is there a mechanism
by which I can tell Cassandra to refresh/reload it's config file,
without disrupting the work it's doing?

Thanks,
Robin.

RE: data modeling question

Posted by "Coe, Robin" <ro...@bluecoat.com>.

> If your ops team really can't be trusted with "push this new config
file out" then that should be a separate tool.

One that leaves a boot print? ;)  I agree in principal, I've just grown cautious over the years.

> Again, there are standard solutions for this.  Use one. :)  round
robin dns, haproxy, ...

*sigh*, I guess I didn't mention I'm alone on this evaluation project.  Meh, automatic failover can wait for when I get sign-off. :)

Thanks,
Robin.

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: December 3, 2009 5:30 PM
To: cassandra-user@incubator.apache.org
Subject: Re: data modeling question

On Thu, Dec 3, 2009 at 4:23 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> I *think* I like the idea of Cassandra pushing a CF change to its peers, as opposed to managing it by a separate admin task, simply because I wouldn't want a change managed by an application admin to be missed because of bad communication, forgetfulness, etc., with the Ops team.

Managing this w/in cassandra is bad separation of concerns.

If your ops team really can't be trusted with "push this new config
file out" then that should be a separate tool.

> So, considering that I currently have to take down a node to make a CF change, I'm wondering how to perform automatic failover from my application?  Is there a mechanism by which I can request from Cassandra all the destination IP:ports for the nodes in a cluster, so I can adapt dynamically?

Again, there are standard solutions for this.  Use one. :)  round
robin dns, haproxy, ...

-Jonathan

Re: data modeling question

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Dec 3, 2009 at 4:23 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> I *think* I like the idea of Cassandra pushing a CF change to its peers, as opposed to managing it by a separate admin task, simply because I wouldn't want a change managed by an application admin to be missed because of bad communication, forgetfulness, etc., with the Ops team.

Managing this w/in cassandra is bad separation of concerns.

If your ops team really can't be trusted with "push this new config
file out" then that should be a separate tool.

> So, considering that I currently have to take down a node to make a CF change, I'm wondering how to perform automatic failover from my application?  Is there a mechanism by which I can request from Cassandra all the destination IP:ports for the nodes in a cluster, so I can adapt dynamically?

Again, there are standard solutions for this.  Use one. :)  round
robin dns, haproxy, ...

-Jonathan

RE: data modeling question

Posted by "Coe, Robin" <ro...@bluecoat.com>.

Yup, this looks like pretty good coverage of the topic.

I *think* I like the idea of Cassandra pushing a CF change to its peers, as opposed to managing it by a separate admin task, simply because I wouldn't want a change managed by an application admin to be missed because of bad communication, forgetfulness, etc., with the Ops team.

So, considering that I currently have to take down a node to make a CF change, I'm wondering how to perform automatic failover from my application?  Is there a mechanism by which I can request from Cassandra all the destination IP:ports for the nodes in a cluster, so I can adapt dynamically?  For example, if I ramp up/down Cassandra instances based on server load, I would like my application to automatically know what servers are available, to execute automatic reconnection when the node I'm connected to goes down.

Thanks,
Robin.

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: December 3, 2009 4:32 PM
To: cassandra-user@incubator.apache.org
Subject: Re: data modeling question

On Thu, Dec 3, 2009 at 3:22 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> I seem to recall some talk about dynamically setting a key space or maybe it was dynamically loading column families?  Is this already a feature request, to your knowledge, or should I open a Jira issue?

https://issues.apache.org/jira/browse/CASSANDRA-44 maybe?

> If yes to opening a Jira issue, should it be the feature of dynamically reloading the config or dynamically creating CFs?

Well, it's sort of the same thing, in that you need the latter and the
former is the simplest way to actually interface with it.

-Jonathan

Re: data modeling question

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Dec 3, 2009 at 3:22 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> I seem to recall some talk about dynamically setting a key space or maybe it was dynamically loading column families?  Is this already a feature request, to your knowledge, or should I open a Jira issue?

https://issues.apache.org/jira/browse/CASSANDRA-44 maybe?

> If yes to opening a Jira issue, should it be the feature of dynamically reloading the config or dynamically creating CFs?

Well, it's sort of the same thing, in that you need the latter and the
former is the simplest way to actually interface with it.

-Jonathan

RE: data modeling question

Posted by "Coe, Robin" <ro...@bluecoat.com>.

I seem to recall some talk about dynamically setting a key space or maybe it was dynamically loading column families?  Is this already a feature request, to your knowledge, or should I open a Jira issue?  If yes to opening a Jira issue, should it be the feature of dynamically reloading the config or dynamically creating CFs?

Thanks,
Robin.

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: December 3, 2009 4:02 PM
To: cassandra-user@incubator.apache.org
Subject: Re: data modeling question

On Thu, Dec 3, 2009 at 2:54 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> Since adding/removing CFs requires editing the storage-conf file and
> restarting Cassandra to pick up the change, I'm wondering if there's a
> better way to implement this model?  Alternatively, is there a mechanism by
> which I can tell Cassandra to refresh/reload it's config file, without
> disrupting the work it's doing?

Nope.  But fixing this would be the Right Way to approach things IMO.

-Jonathan

Re: data modeling question

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Dec 3, 2009 at 2:54 PM, Coe, Robin <ro...@bluecoat.com> wrote:
> Since adding/removing CFs requires editing the storage-conf file and
> restarting Cassandra to pick up the change, I’m wondering if there’s a
> better way to implement this model?  Alternatively, is there a mechanism by
> which I can tell Cassandra to refresh/reload it’s config file, without
> disrupting the work it’s doing?

Nope.  But fixing this would be the Right Way to approach things IMO.

-Jonathan