You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ersin Er <er...@gmail.com> on 2012/08/22 08:44:07 UTC

Cluster per Application vs. Multi-Application Clusters

Hi all,

What are the advantages of allocating a cluster for a single application vs
running multiple applications on the same cassandra cluster? Is any of the
models suggested over the other?

Thanks.

-- 
Ersin Er

Re: Cluster per Application vs. Multi-Application Clusters

Posted by "Hiller, Dean" <De...@nrel.gov>.
True, all in one cluster is very comparable to putting your application on
amazon's cloud. When you have lots of apps, you can benefit from a batch
job at night using resources that are not used by your day job apps.
Always tradeoffs of course as if both apps go off at the same timeŠ.well,
you get the picture.

Dean

On 8/22/12 1:30 PM, "Edward Capriolo" <ed...@gmail.com> wrote:

>If you are staring out small one logical/physical cluster is probably
>the best and only approach.
>
>Long term this is very case by case dependent but I generally believe
>Cluster per Application is the best approach. Although I consider it
>"Cluster per QOS"
>
>For our use cases I find that two applications have very different
>data sizes and quality of service requirements. For example, one
>application may have a small dataset size and a high repeated read/
>cache hit rate scenario. While another application may have a large
>sparse dataset and a "random read pattern". Also one application may
>demand fast < 3 ms reads while the other may find 10 or 20 ms reads
>acceptable.
>
>When those two applications are placed on the same set of hardware you
>end up scaling them both even though at a given time only one or the
>other needs to be scaled. In extreme cases application 1 and 2 cause
>contention and make each other unhappy.
>
>What is best to do is architect your systems in such a way that moving
>an individual column family to a new set of hardware is not difficult.
>This might involve something map reduce program that can bulk load
>existing data between two clusters, while your front end application
>can send the write/updates/deletes to both the old an the new cluster.
>Also make sure your application does not have too many hard coded
>touch points that assume a single cluster.
>
>As you mentioned one thing gained from keeping everything in the same
>keyspace is connection pooling. However unlike a RDBMS world where
>coordinated transactions have to happen in order, etc, etc that is not
>the case with C* so getting all data into the same physical "system"
>is not as important.
>
>
>
>On Wed, Aug 22, 2012 at 8:25 AM, Hiller, Dean <De...@nrel.gov>
>wrote:
>> Just an opinion here as we are having to do this ourselves loading tons
>>of researchers datasets into one clusters.  We are going the path of one
>>keyspace as it makes it easier if you ever want to mine the data so you
>>don't have to keep building different clients for another keyspace.  We
>>ended up adding our own security layer as well so researchers can expose
>>their datasets to other researchers and once exposed, other researchers
>>can join that data with their existing data.
>>
>> This of course is just one use case, but if 10 applications use
>>cassandra, you still may find a benefit in having an 11th data mining
>>app look at the data from all 10 apps.
>>
>> Later,
>> Dean
>>
>> playOrm Developer
>>
>> From: Ersin Er <er...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>><us...@cassandra.apache.org>>
>> Date: Wednesday, August 22, 2012 12:44 AM
>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>><us...@cassandra.apache.org>>
>> Subject: Cluster per Application vs. Multi-Application Clusters
>>
>> Hi all,
>>
>> What are the advantages of allocating a cluster for a single
>>application vs running multiple applications on the same cassandra
>>cluster? Is any of the models suggested over the other?
>>
>> Thanks.
>>
>> --
>> Ersin Er


Re: Cluster per Application vs. Multi-Application Clusters

Posted by Edward Capriolo <ed...@gmail.com>.
If you are staring out small one logical/physical cluster is probably
the best and only approach.

Long term this is very case by case dependent but I generally believe
Cluster per Application is the best approach. Although I consider it
"Cluster per QOS"

For our use cases I find that two applications have very different
data sizes and quality of service requirements. For example, one
application may have a small dataset size and a high repeated read/
cache hit rate scenario. While another application may have a large
sparse dataset and a "random read pattern". Also one application may
demand fast < 3 ms reads while the other may find 10 or 20 ms reads
acceptable.

When those two applications are placed on the same set of hardware you
end up scaling them both even though at a given time only one or the
other needs to be scaled. In extreme cases application 1 and 2 cause
contention and make each other unhappy.

What is best to do is architect your systems in such a way that moving
an individual column family to a new set of hardware is not difficult.
This might involve something map reduce program that can bulk load
existing data between two clusters, while your front end application
can send the write/updates/deletes to both the old an the new cluster.
Also make sure your application does not have too many hard coded
touch points that assume a single cluster.

As you mentioned one thing gained from keeping everything in the same
keyspace is connection pooling. However unlike a RDBMS world where
coordinated transactions have to happen in order, etc, etc that is not
the case with C* so getting all data into the same physical "system"
is not as important.



On Wed, Aug 22, 2012 at 8:25 AM, Hiller, Dean <De...@nrel.gov> wrote:
> Just an opinion here as we are having to do this ourselves loading tons of researchers datasets into one clusters.  We are going the path of one keyspace as it makes it easier if you ever want to mine the data so you don't have to keep building different clients for another keyspace.  We ended up adding our own security layer as well so researchers can expose their datasets to other researchers and once exposed, other researchers can join that data with their existing data.
>
> This of course is just one use case, but if 10 applications use cassandra, you still may find a benefit in having an 11th data mining app look at the data from all 10 apps.
>
> Later,
> Dean
>
> playOrm Developer
>
> From: Ersin Er <er...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Wednesday, August 22, 2012 12:44 AM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Cluster per Application vs. Multi-Application Clusters
>
> Hi all,
>
> What are the advantages of allocating a cluster for a single application vs running multiple applications on the same cassandra cluster? Is any of the models suggested over the other?
>
> Thanks.
>
> --
> Ersin Er

Re: Cluster per Application vs. Multi-Application Clusters

Posted by "Hiller, Dean" <De...@nrel.gov>.
Just an opinion here as we are having to do this ourselves loading tons of researchers datasets into one clusters.  We are going the path of one keyspace as it makes it easier if you ever want to mine the data so you don't have to keep building different clients for another keyspace.  We ended up adding our own security layer as well so researchers can expose their datasets to other researchers and once exposed, other researchers can join that data with their existing data.

This of course is just one use case, but if 10 applications use cassandra, you still may find a benefit in having an 11th data mining app look at the data from all 10 apps.

Later,
Dean

playOrm Developer

From: Ersin Er <er...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, August 22, 2012 12:44 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Cluster per Application vs. Multi-Application Clusters

Hi all,

What are the advantages of allocating a cluster for a single application vs running multiple applications on the same cassandra cluster? Is any of the models suggested over the other?

Thanks.

--
Ersin Er