You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ja...@legtux.org on 2019/10/01 11:40:38 UTC

Sizing a cluster

Hi,
We want to use Cassandra to store camera detection. The size of a picture + data is about 0.5MB. We starting with 5 devices, but we targeting 50 device for the next year, and could go up to 1000. I summary everything ,
	*  Number of sources: 5 - 50 - 1000 (src)  
	*  Frequency of data: 1Hz (f)  
	*  Estimate size of data: 0.5MB (s)  
	*  Replication factor: 3 (RF)  
I calculated the size per year,
	* src * f *60 * 60 * 24 * 365 * s
gives me,
	* 5 sources = 0.24 PB per year 
	* 50 sources = 2.4 PB per year 
	* 1000 sources = 47.3 per year 
so if respect the 2TB rule, I got, 120 nodes in the simplest case (5 sources). Am I right ?

regards,
Nicolas Jäger

Re: Sizing a cluster

Posted by ja...@legtux.org.
Hi Léo thax for the links,

	Is that the size of the uncompressed data or the data once it has been inserted and compressed by cassandra ?The size of 0.5MB is the size of the data we sent, before cassandra do compression if any.
 Looking at the cassandra compression : http://cassandra.apache.org/doc/latest/operating/compression.html (http://cassandra.apache.org/doc/latest/operating/compression.html) and testing different parameters on a test cluster might be interesting before you do the sizing of the final production cluster,We are in dev phase, we have two small clusters. I haven't yet take in account the compression. For the compaction I roughly considered that we need 50% extra space per node (the extra space is not in the calculation I did in my last email).
1 octobre 2019 08:58 "Léo FERLIN SUTTON" <lferlin@mailjet.com.invalid (mailto:lferlin@mailjet.com.invalid?to=%22L%C3%A9o%20FERLIN%20SUTTON%22%20<lf...@mailjet.com.invalid>)> a écrit:
 Hi !
I'm not an expert but don't forget that cassandra needs space to do it's compactions. 
Take a look at the worst case scenarios from this datastax grid : https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html#capacityPlanning__disk (https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html#capacityPlanning__disk)  
> The size of a picture + data is about 0.5MB  
Is that the size of the uncompressed data or the data once it has been inserted and compressed by cassandra ? 
 Looking at the cassandra compression : http://cassandra.apache.org/doc/latest/operating/compression.html (http://cassandra.apache.org/doc/latest/operating/compression.html) and testing different parameters on a test cluster might be interesting before you do the sizing of the final production cluster, 
Regards, 
Leo  
 On Tue, Oct 1, 2019 at 1:40 PM <jagernicolas@legtux.org (mailto:jagernicolas@legtux.org)> wrote: 
Hi,
We want to use Cassandra to store camera detection. The size of a picture + data is about 0.5MB. We starting with 5 devices, but we targeting 50 device for the next year, and could go up to 1000. I summary everything ,
	*  Number of sources: 5 - 50 - 1000 (src)  
	*  Frequency of data: 1Hz (f)  
	*  Estimate size of data: 0.5MB (s)  
	*  Replication factor: 3 (RF)  
I calculated the size per year,
	* src * f *60 * 60 * 24 * 365 * s
gives me,
	* 5 sources = 0.24 PB per year 
	* 50 sources = 2.4 PB per year 
	* 1000 sources = 47.3 per year 
so if respect the 2TB rule, I got, 120 nodes in the simplest case (5 sources). Am I right ?

regards,
Nicolas Jäger

Re: Sizing a cluster

Posted by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID>.
Hi !

I'm not an expert but don't forget that cassandra needs space to do it's
compactions.

Take a look at the worst case scenarios from this datastax grid :
https://docs.datastax.com/en/dse-planning/doc/planning/capacityPlanning.html#capacityPlanning__disk

> The size of a picture + data is about 0.5MB

Is that the size of the uncompressed data or the data once it has been
inserted and compressed by cassandra ?
Looking at the cassandra compression :
http://cassandra.apache.org/doc/latest/operating/compression.html and
testing different parameters on a test cluster might be interesting before
you do the sizing of the final production cluster,

Regards,

Leo

On Tue, Oct 1, 2019 at 1:40 PM <ja...@legtux.org> wrote:

> Hi,
> We want to use Cassandra to store camera detection. The size of a picture
> + data is about 0.5MB. We starting with 5 devices, but we targeting 50
> device for the next year, and could go up to 1000. I summary everything ,
>
>    - Number of sources: 5 - 50 - 1000 (src)
>    - Frequency of data: 1Hz (f)
>    - Estimate size of data: 0.5MB (s)
>    - Replication factor: 3 (RF)
>
>
> I calculated the size per year,
>
>    - src * f *60 * 60 * 24 * 365 * s
>
>
> gives me,
>
>    - 5 sources = 0.24 PB per year
>    - 50 sources = 2.4 PB per year
>    - 1000 sources = 47.3 per year
>
>
> so if respect the 2TB rule, I got, 120 nodes in the simplest case (5
> sources). Am I right ?
>
> regards,
> Nicolas Jäger
>