You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by kannan chandrasekaran <ck...@yahoo.com> on 2010/09/12 21:56:28 UTC

Couple of cache related questions

1) What determines the amount of memory used per schema ignoring the general 
overhead to get cassandra up and running?  Is it just the size of the caches for 
the column Family + the memtable size ?

2) Is the size of the cache configured ( in terms of absolute numbers or 
percentages), an upper bound on the amount of memory that can be allocated and 
which grows as more data is filled up in the cache ? I believe the answer is 
yes...please correct me if I am wrong .... Assuming the answer is yes, What if I 
specify the cache size as X items and there is only enough memory to allocate 
for say, X-1000 items ? Will cassandra just allocate for X-1000 and keep 
swapping cache items in and out as required ? Is there a possibility of a crash 
due to lack of memory ?

3) Taking this one step further, if there is insufficient memory to allocate 
caches across column familes ( and across Keyspaces), Will cassandra pull memory 
of one cache and allocate it to the other one as required ? ( a little 
over-ambitious..but thought I would just ask instead of assuming) 


Thank you
Kannan


      

Re: Couple of cache related questions

Posted by kannan chandrasekaran <ck...@yahoo.com>.
Thanks a lot Jonathan !!!

Kannan




________________________________
From: Jonathan Ellis <jb...@gmail.com>
To: user@cassandra.apache.org
Sent: Mon, September 13, 2010 4:47:05 PM
Subject: Re: Couple of cache related questions

On Sun, Sep 12, 2010 at 6:10 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
>> 1) What determines the amount of memory used per schema ignoring the
>> general
>> overhead to get cassandra up and running?  Is it just the size of the
>> caches
>> for the column Family + the memtable size ?
>
> and the bloom filter and index samples from the sstable files.
>
> Does that mean that cassandra tries to load the index and filter tables in
> memory as well, for each sstable in the keyspace?

it means it loads the bloom filter file, and a sample from the index file.

> Once the final memtable is flushed to the disk ( assuming no more writes) ,
> does read path also incur the memory size of the memtable for that
> particular CF ?

no.

> Does cassandra try to preallocate memory after startup for each schema even
> if its not used ( not being currently written to or read from)  ?

no.

> If I understand you correctly then I need to make sure that
>  the sum of sizes of all items in the cache across all the keyspaces +
> memtable + bloom filter + index samples  < Heap space

yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com



      

Re: Couple of cache related questions

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Sep 12, 2010 at 6:10 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
>> 1) What determines the amount of memory used per schema ignoring the
>> general
>> overhead to get cassandra up and running?  Is it just the size of the
>> caches
>> for the column Family + the memtable size ?
>
> and the bloom filter and index samples from the sstable files.
>
> Does that mean that cassandra tries to load the index and filter tables in
> memory as well, for each sstable in the keyspace?

it means it loads the bloom filter file, and a sample from the index file.

> Once the final memtable is flushed to the disk ( assuming no more writes) ,
> does read path also incur the memory size of the memtable for that
> particular CF ?

no.

> Does cassandra try to preallocate memory after startup for each schema even
> if its not used ( not being currently written to or read from)  ?

no.

> If I understand you correctly then I need to make sure that
>  the sum of sizes of all items in the cache across all the keyspaces +
> memtable + bloom filter + index samples  < Heap space

yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Couple of cache related questions

Posted by kannan chandrasekaran <ck...@yahoo.com>.
Thanks for the replies Jonathan...Couple more clarifications(in bold)




________________________________
From: Jonathan Ellis <jb...@gmail.com>
To: user@cassandra.apache.org
Sent: Sun, September 12, 2010 1:47:09 PM
Subject: Re: Couple of cache related questions

On Sun, Sep 12, 2010 at 2:56 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
> 1) What determines the amount of memory used per schema ignoring the general
> overhead to get cassandra up and running?  Is it just the size of the caches
> for the column Family + the memtable size ?

and the bloom filter and index samples from the sstable files.

Does that mean that cassandra tries to load the index and filter tables in 
memory as well, for each sstable in the keyspace?

Once the final memtable is flushed to the disk ( assuming no more writes) , does 
read path also incur the memory size of the memtable for that particular CF ?

Does cassandra try to preallocate memory after startup for each schema even if 
its not used ( not being currently written to or read from)  ? 


I apologize for so many questions,here is what I am trying to do ....
 I might need more than one schema to be configured and wondering if cassandra 
will take up memory proportional to the number of schemas "configured" as 
opposed to the ones "currently in use". This in-turn will help me decide on the 
maximum number of keyspaces that I can configure within a given heap size.

> 2) Is the size of the cache configured ( in terms of absolute numbers or
> percentages), an upper bound on the amount of memory that can be allocated
> and which grows as more data is filled up in the cache ?

no.  it's strictly the number of items you give it.  so you need to be
careful not to make it larger than you have room in the heap.

If I understand you correctly then I need to make sure that
 the sum of sizes of all items in the cache across all the keyspaces + memtable 
+ bloom filter + index samples  < Heap space

Thanks once again.
-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com



      

Re: Couple of cache related questions

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Sep 12, 2010 at 2:56 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
> 1) What determines the amount of memory used per schema ignoring the general
> overhead to get cassandra up and running?  Is it just the size of the caches
> for the column Family + the memtable size ?

and the bloom filter and index samples from the sstable files.

> 2) Is the size of the cache configured ( in terms of absolute numbers or
> percentages), an upper bound on the amount of memory that can be allocated
> and which grows as more data is filled up in the cache ?

no.  it's strictly the number of items you give it.  so you need to be
careful not to make it larger than you have room in the heap.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com