You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jeremy Jongsma <je...@barchart.com> on 2014/07/08 19:53:58 UTC

New application - separate column family or separate cluster?

Do you prefer purpose-specific Cassandra clusters that support a single
application's data set, or a single Cassandra cluster that contains column
families for many applications? I realize there is no ideal answer for
every situation, but what have your experiences been in this area for
cluster planning?

My reason for asking is that we have one application with high data volume
(multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
the first place. Now we have the tools and cluster management
infrastructure built up to the point where it is not a major investment to
store smaller sets of data for other applications in C* also, and I am
debating whether to:

1) Store everything in one large cluster (no isolation, low cost)
2) Use one cluster for the high-volume data, and one for everything else
(good isolation, medium cost)
3) Give every major service its own cluster, even if they have small
amounts of data (best isolation, highest cost)

I suspect #2 is the way to go as far as balancing hosting costs and
application performance isolation. Any pros or cons am I missing?

-j

Re: New application - separate column family or separate cluster?

Posted by Jeremy Jongsma <je...@barchart.com>.

Thanks Tupshin, I am thinking #2 is the way to go in my case, and always
have the option of migrating column families to a new cluster if needed.

Parag, At the traffic volumes I'm talking about, #2 (and especially #3)
will have a lot more total VM nodes, because the other apps are used
lightly enough that there is no reason to add capacity specifically for
them to an already large cluster. But app-specific clusters would need at
least 3 nodes each (for redundancy) when the actual traffic load would
require less than one, hence the increased node costs.


On Wed, Jul 9, 2014 at 7:07 AM, Parag Patel <pp...@clearpoolgroup.com>
wrote:

>  In your scenario #1, is the total number of nodes staying the same?
> Meaning, if you launch multiple clusters for #2, you’d have N total nodes –
> are we assuming #1 has N or less than N?
>
>
>
> If #1 and #2 both have N, wouldn’t the performance be the same since
> Cassandra’s performance increases linearly?
>
>
>
> *From:* Tupshin Harper [mailto:tupshin@tupshin.com]
> *Sent:* Tuesday, July 08, 2014 11:13 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: New application - separate column family or separate
> cluster?
>
>
>
> I've seen a lot of deployments, and I think you captured the scenarios and
> reasoning quite well. You can apply other nuances and details to #2 (e.g.
> segment based on SLA or topology), but I agree with all of your reasoning.
>
> -Tupshin
> -Global Field Strategy
> -Datastax
>
> On Jul 8, 2014 10:54 AM, "Jeremy Jongsma" <je...@barchart.com> wrote:
>
>  Do you prefer purpose-specific Cassandra clusters that support a single
> application's data set, or a single Cassandra cluster that contains column
> families for many applications? I realize there is no ideal answer for
> every situation, but what have your experiences been in this area for
> cluster planning?
>
>
>
> My reason for asking is that we have one application with high data volume
> (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
> the first place. Now we have the tools and cluster management
> infrastructure built up to the point where it is not a major investment to
> store smaller sets of data for other applications in C* also, and I am
> debating whether to:
>
>
>
> 1) Store everything in one large cluster (no isolation, low cost)
>
> 2) Use one cluster for the high-volume data, and one for everything else
> (good isolation, medium cost)
>
> 3) Give every major service its own cluster, even if they have small
> amounts of data (best isolation, highest cost)
>
>
>
> I suspect #2 is the way to go as far as balancing hosting costs and
> application performance isolation. Any pros or cons am I missing?
>
>
>
> -j
>
>

RE: New application - separate column family or separate cluster?

Posted by Parag Patel <pp...@clearpoolgroup.com>.

In your scenario #1, is the total number of nodes staying the same?  Meaning, if you launch multiple clusters for #2, you’d have N total nodes – are we assuming #1 has N or less than N?

If #1 and #2 both have N, wouldn’t the performance be the same since Cassandra’s performance increases linearly?

From: Tupshin Harper [mailto:tupshin@tupshin.com]
Sent: Tuesday, July 08, 2014 11:13 PM
To: user@cassandra.apache.org
Subject: Re: New application - separate column family or separate cluster?

I've seen a lot of deployments, and I think you captured the scenarios and reasoning quite well. You can apply other nuances and details to #2 (e.g. segment based on SLA or topology), but I agree with all of your reasoning.

-Tupshin
-Global Field Strategy
-Datastax
On Jul 8, 2014 10:54 AM, "Jeremy Jongsma" <je...@barchart.com>> wrote:
Do you prefer purpose-specific Cassandra clusters that support a single application's data set, or a single Cassandra cluster that contains column families for many applications? I realize there is no ideal answer for every situation, but what have your experiences been in this area for cluster planning?

My reason for asking is that we have one application with high data volume (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in the first place. Now we have the tools and cluster management infrastructure built up to the point where it is not a major investment to store smaller sets of data for other applications in C* also, and I am debating whether to:

1) Store everything in one large cluster (no isolation, low cost)
2) Use one cluster for the high-volume data, and one for everything else (good isolation, medium cost)
3) Give every major service its own cluster, even if they have small amounts of data (best isolation, highest cost)

I suspect #2 is the way to go as far as balancing hosting costs and application performance isolation. Any pros or cons am I missing?

-j

Re: New application - separate column family or separate cluster?

Posted by Tupshin Harper <tu...@tupshin.com>.

I've seen a lot of deployments, and I think you captured the scenarios and
reasoning quite well. You can apply other nuances and details to #2 (e.g.
segment based on SLA or topology), but I agree with all of your reasoning.

-Tupshin
-Global Field Strategy
-Datastax
On Jul 8, 2014 10:54 AM, "Jeremy Jongsma" <je...@barchart.com> wrote:

> Do you prefer purpose-specific Cassandra clusters that support a single
> application's data set, or a single Cassandra cluster that contains column
> families for many applications? I realize there is no ideal answer for
> every situation, but what have your experiences been in this area for
> cluster planning?
>
> My reason for asking is that we have one application with high data volume
> (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
> the first place. Now we have the tools and cluster management
> infrastructure built up to the point where it is not a major investment to
> store smaller sets of data for other applications in C* also, and I am
> debating whether to:
>
> 1) Store everything in one large cluster (no isolation, low cost)
> 2) Use one cluster for the high-volume data, and one for everything else
> (good isolation, medium cost)
> 3) Give every major service its own cluster, even if they have small
> amounts of data (best isolation, highest cost)
>
> I suspect #2 is the way to go as far as balancing hosting costs and
> application performance isolation. Any pros or cons am I missing?
>
> -j
>