You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mimi Aluminium <mi...@gmail.com> on 2011/01/20 17:04:10 UTC

memory size and disk size prediction tool

Hi,

We are implementing a 'middlewear' layer to an underneath storage and
need to estimate costs for various system configurations.
Specifically, I want to estimate the resources (memory, disk) for our
data model.

Is there a tool that  given certain storage configuration parameters,
column family fields number and sizes and other details, and then
workload-dependant  parameters such as read/write average rates etc. can
predict the
resource consumption (i.e, memory, disk)  in an offline mode?

Thanks,
Miriam

Re: memory size and disk size prediction tool

Posted by Mimi Aluminium <mi...@gmail.com>.
Aaron,
Thanks a lot for your answer,
I had in mind something more generic that I am currently working on.
The idea is to have a tool with GUI screens where you can feed-in the
various column families you are using with column (names and values) sizes.
Then it will have anotehr screen withh application-aware fields names
associated with their value - all defined by the user. Using these
parameters, this (modeling) tool should be able to calculate disk usage and
hopefully ram usage...
Anyway I am trying to do that for our own case using a simple excel
spreadsheet, let you know when it will be ready,
Thanks,
Miriam

On Thu, Jan 20, 2011 at 11:49 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

>  Not that I know of, do you have an existing test system you can use as a
> baseline ?
>
> For memory have a read of the JVM Heap Size section here
> http://wiki.apache.org/cassandra/MemtableThresholds
> You will also want to have some memory for disk caching and the os. 8 or
> 12gb feels like a good start.
>
> For disk capacity I just did some regular old guess work, and multipled my
> number by 1.25 to
> cover the on disk overhead. You also want to avoid using more than 50% of
> the local disk space, due to
> compaction and the way the disk performance falls away. There is more info
> available here
> http://wiki.apache.org/cassandra/CassandraHardware
>
> How much throughout do you need? How much redundancy do you need? How much
> data do you
> plan to store?
>
> Hope that helps
> Aaron
>
> On 21 Jan, 2011,at 05:04 AM, Mimi Aluminium <mi...@gmail.com>
> wrote:
>
>     Hi,
>
> We are implementing a 'middlewear' layer to an underneath storage and
> need to estimate costs for various system configurations.
> Specifically, I want to estimate the resources (memory, disk) for our
> data model.
>
> Is there a tool that  given certain storage configuration parameters,
> column family fields number and sizes and other details, and then
> workload-dependant  parameters such as read/write average rates etc. can
> predict the
> resource consumption (i.e, memory, disk)  in an offline mode?
>
> Thanks,
> Miriam
>
>

Re: memory size and disk size prediction tool

Posted by Aaron Morton <aa...@thelastpickle.com>.
Not that I know of, do you have an existing test system you can use as a baseline ? 

For memory have a read of the JVM Heap Size section here http://wiki.apache.org/cassandra/MemtableThresholds
You will also want to have some memory for disk caching and the os. 8 or 12gb feels like a good start.

For disk capacity I just did some regular old guess work, and multipled my number by 1.25 to 
cover the on disk overhead. You also want to avoid using more than 50% of the local disk space, due to 
compaction and the way the disk performance falls away. There is more info available here 
http://wiki.apache.org/cassandra/CassandraHardware

How much throughout do you need? How much redundancy do you need? How much data do you 
plan to store?

Hope that helps
Aaron

On 21 Jan, 2011,at 05:04 AM, Mimi Aluminium <mi...@gmail.com> wrote:

Hi,
We are implementing a 'middlewear' layer to an underneath storage and
need to estimate costs for various system configurations.
Specifically, I want to estimate the resources (memory, disk) for our
data model.
Is there a tool that  given certain storage configuration parameters,
column family fields number and sizes and other details, and then
workload-dependant  parameters such as read/write average rates etc. can predict the
resource consumption (i.e, memory, disk)  in an offline mode?
Thanks,
Miriam