You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by kannan chandrasekaran <ck...@yahoo.com> on 2010/09/12 21:56:28 UTC
Couple of cache related questions
1) What determines the amount of memory used per schema ignoring the general
overhead to get cassandra up and running? Is it just the size of the caches for
the column Family + the memtable size ?
2) Is the size of the cache configured ( in terms of absolute numbers or
percentages), an upper bound on the amount of memory that can be allocated and
which grows as more data is filled up in the cache ? I believe the answer is
yes...please correct me if I am wrong .... Assuming the answer is yes, What if I
specify the cache size as X items and there is only enough memory to allocate
for say, X-1000 items ? Will cassandra just allocate for X-1000 and keep
swapping cache items in and out as required ? Is there a possibility of a crash
due to lack of memory ?
3) Taking this one step further, if there is insufficient memory to allocate
caches across column familes ( and across Keyspaces), Will cassandra pull memory
of one cache and allocate it to the other one as required ? ( a little
over-ambitious..but thought I would just ask instead of assuming)
Thank you
Kannan
Re: Couple of cache related questions
Posted by kannan chandrasekaran <ck...@yahoo.com>.
Thanks a lot Jonathan !!!
Kannan
________________________________
From: Jonathan Ellis <jb...@gmail.com>
To: user@cassandra.apache.org
Sent: Mon, September 13, 2010 4:47:05 PM
Subject: Re: Couple of cache related questions
On Sun, Sep 12, 2010 at 6:10 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
>> 1) What determines the amount of memory used per schema ignoring the
>> general
>> overhead to get cassandra up and running? Is it just the size of the
>> caches
>> for the column Family + the memtable size ?
>
> and the bloom filter and index samples from the sstable files.
>
> Does that mean that cassandra tries to load the index and filter tables in
> memory as well, for each sstable in the keyspace?
it means it loads the bloom filter file, and a sample from the index file.
> Once the final memtable is flushed to the disk ( assuming no more writes) ,
> does read path also incur the memory size of the memtable for that
> particular CF ?
no.
> Does cassandra try to preallocate memory after startup for each schema even
> if its not used ( not being currently written to or read from) ?
no.
> If I understand you correctly then I need to make sure that
> the sum of sizes of all items in the cache across all the keyspaces +
> memtable + bloom filter + index samples < Heap space
yes.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: Couple of cache related questions
Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Sep 12, 2010 at 6:10 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
>> 1) What determines the amount of memory used per schema ignoring the
>> general
>> overhead to get cassandra up and running? Is it just the size of the
>> caches
>> for the column Family + the memtable size ?
>
> and the bloom filter and index samples from the sstable files.
>
> Does that mean that cassandra tries to load the index and filter tables in
> memory as well, for each sstable in the keyspace?
it means it loads the bloom filter file, and a sample from the index file.
> Once the final memtable is flushed to the disk ( assuming no more writes) ,
> does read path also incur the memory size of the memtable for that
> particular CF ?
no.
> Does cassandra try to preallocate memory after startup for each schema even
> if its not used ( not being currently written to or read from) ?
no.
> If I understand you correctly then I need to make sure that
> the sum of sizes of all items in the cache across all the keyspaces +
> memtable + bloom filter + index samples < Heap space
yes.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: Couple of cache related questions
Posted by kannan chandrasekaran <ck...@yahoo.com>.
Thanks for the replies Jonathan...Couple more clarifications(in bold)
________________________________
From: Jonathan Ellis <jb...@gmail.com>
To: user@cassandra.apache.org
Sent: Sun, September 12, 2010 1:47:09 PM
Subject: Re: Couple of cache related questions
On Sun, Sep 12, 2010 at 2:56 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
> 1) What determines the amount of memory used per schema ignoring the general
> overhead to get cassandra up and running? Is it just the size of the caches
> for the column Family + the memtable size ?
and the bloom filter and index samples from the sstable files.
Does that mean that cassandra tries to load the index and filter tables in
memory as well, for each sstable in the keyspace?
Once the final memtable is flushed to the disk ( assuming no more writes) , does
read path also incur the memory size of the memtable for that particular CF ?
Does cassandra try to preallocate memory after startup for each schema even if
its not used ( not being currently written to or read from) ?
I apologize for so many questions,here is what I am trying to do ....
I might need more than one schema to be configured and wondering if cassandra
will take up memory proportional to the number of schemas "configured" as
opposed to the ones "currently in use". This in-turn will help me decide on the
maximum number of keyspaces that I can configure within a given heap size.
> 2) Is the size of the cache configured ( in terms of absolute numbers or
> percentages), an upper bound on the amount of memory that can be allocated
> and which grows as more data is filled up in the cache ?
no. it's strictly the number of items you give it. so you need to be
careful not to make it larger than you have room in the heap.
If I understand you correctly then I need to make sure that
the sum of sizes of all items in the cache across all the keyspaces + memtable
+ bloom filter + index samples < Heap space
Thanks once again.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com
Re: Couple of cache related questions
Posted by Jonathan Ellis <jb...@gmail.com>.
On Sun, Sep 12, 2010 at 2:56 PM, kannan chandrasekaran
<ck...@yahoo.com> wrote:
> 1) What determines the amount of memory used per schema ignoring the general
> overhead to get cassandra up and running? Is it just the size of the caches
> for the column Family + the memtable size ?
and the bloom filter and index samples from the sstable files.
> 2) Is the size of the cache configured ( in terms of absolute numbers or
> percentages), an upper bound on the amount of memory that can be allocated
> and which grows as more data is filled up in the cache ?
no. it's strictly the number of items you give it. so you need to be
careful not to make it larger than you have room in the heap.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com