You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleg Proudnikov <ol...@cloudorange.com> on 2012/03/26 16:22:43 UTC

One or Two clusters?

Hi,

Could someone please help me understand the benefits of having a single large cluster vs. having two smaller clusters separated by the pattern of use? One, MOSTLY WRITE cluster could incrementally accumulate large amounts of data throughout the day. The daily increment would be processed, summarized and stored into the second READ cluster at night. Users would only need to interact with the READ portion of the overall system mostly during the day. Writes would be spread throughout the day and will be a function of user activity with some bulk load activity from time to time.  WRITE portion of the database would be an order of magnitude larger than the READ portion. READ portion would have an an order of magnitude higher traffic except during periodic bulk loads.

On one hand, If I were to have a single cluster I would have more  resources for the users and potentially better scalability. A single cluster may need fewer servers overall, provided write activity does not affect reads... On the other hand, write activity and associated memory consumption, GC, as well as maintenance riutines may affect READ system. The system will be hosted on EC2.

I would appreciate any thoughts.

Regards,
Oleg

Re: One or Two clusters?

Posted by aaron morton <aa...@thelastpickle.com>.
Use one cluster. Use lots-o-machines.

The read and write paths do not directly  interfere with each other like they do in a RDBMS. Compaction created by writes can suck up disk IO, but this is throttled so in practice it is not such a big problem. Excessive GC created by reads or compaction may slow down the server, but you will want to avoid them anyway.

The one caveat is: it depends on how you are transforming the data. If you have a are using Hadoop consider creating a single cluster with multiple DC's (like Data Stax do). One for OLTP and one for OLAP, do the hadoop work in the OLAP DC and have the online app read-write to the OLTP one. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/03/2012, at 3:22 AM, Oleg Proudnikov wrote:

> Hi,
> 
> Could someone please help me understand the benefits of having a single large cluster vs. having two smaller clusters separated by the pattern of use? One, MOSTLY WRITE cluster could incrementally accumulate large amounts of data throughout the day. The daily increment would be processed, summarized and stored into the second READ cluster at night. Users would only need to interact with the READ portion of the overall system mostly during the day. Writes would be spread throughout the day and will be a function of user activity with some bulk load activity from time to time.  WRITE portion of the database would be an order of magnitude larger than the READ portion. READ portion would have an an order of magnitude higher traffic except during periodic bulk loads.
> 
> On one hand, If I were to have a single cluster I would have more  resources for the users and potentially better scalability. A single cluster may need fewer servers overall, provided write activity does not affect reads... On the other hand, write activity and associated memory consumption, GC, as well as maintenance riutines may affect READ system. The system will be hosted on EC2.
> 
> I would appreciate any thoughts.
> 
> Regards,
> Oleg