You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by openvictor Open <op...@gmail.com> on 2011/04/01 16:34:00 UTC

Abnormal memory consumption

Hello everybody,

I am quite new to Cassandra and I am worried about an apache cassandra
server that is running on an small isolated server with only 2 Gb of RAM. On
this server there is very little data in Cassandra (  ~3 Mb only text in
column values) but there are other servers such as : SolR, Tomcat, Redis,
PostGreSQL. There is quite a lot of column families (about 15) but some
column families are empty at the moment. At the moment memory consumption is
484 Mb real and 948556 in virtual.

I modified the storage-conf ( I am running apache cassandra 0.6.11) I set
DiskAccessMode in standard since I am running on debian 64 bits. I also set
the MemtableThroughput to 16 Mb instead of 64 Mb and I lower the Xms value
to and Xmx to 128M and 256M.

My question is : where does this giant memory overhead comes from (484 Mb
for 3 Mb of data seems insane) ? And more importantly : how can I set
Cassandra to use maximum let's say 500 Mb, because at this rate Cassandra
will be well over that limit soon.
For information because of security I cannot use JMX, except if there is a
way to use JMX without an interface through SSH.

Thank you for your help.
Victor

RE: Abnormal memory consumption

Posted by Jeremiah Jordan <JE...@morningstar.com>.
Connect with jconsole and watch the memory consumption graph.  Click the
force GC button watch what the low point is, that is how much memory is
being used for persistent stuff, the rest is garbage generated while
satisfying queries.  Run a query, watch how the graph spikes up when you
run your query, that is how much is needed for the query.  Like others
have said, Cassandra isn't using 600Mb of RAM, the Java Virtual Machine
is using 600Mb of RAM, because your settings told it it could.  The JVM
will use as much memory as your settings allow it to.  If you really are
putting that little data into your test server, you should be able to
tune everything down to only 256Mb easily (I do this for test instances
of Cassandra that I spin up to run some tests on), maybe further.
 
-Jeremiah

________________________________

From: openvictor Open [mailto:openvictor@gmail.com] 
Sent: Wednesday, April 06, 2011 7:59 PM
To: user@cassandra.apache.org
Subject: Re: Abnormal memory consumption


Hello Paul,

Thank you for the tip. The random port attribution policy of JMX was
really making me mad ! Good to know there is a solution for that
problem.

Concerning the rest of the conversation, my only concern is that as an
administrator and a student it is hard to constantly watch  Cassandra
instances so that they don't crash. As much as I love the principle of
Cassandra, being constantly afraid of memory consumption is an issue in
my opinion. That being said, I took a new 16 Gb server today, but I
don't want Cassandra to eat up everything if it is not needed, because
Cassandra will have some neighbors such as Tomcat, solR on this server. 
And for me it is very weird that on my small instance where I put a lot
of constraints like throughput_memtableInMb to 6 Cassandra uses 600 Mb
of ram for 6 Mb of data. It seems to be a little bit of an overkill to
me... And so far I failed to find any information on what this massive
overhead can be...

Thank you for your answers and for taking the time to answer my
questions.


2011/4/6 Paul Choi <pa...@plaxo.com>


	You can use JMX over ssh by doing this:
	
http://blog.reactive.org/2011/02/connecting-to-cassandra-jmx-via-ssh.htm
l
	Basically, you use SSH -D to do dynamic application port
forwarding.

	In terms of scaling, you'll be able to afford 120GB RAM/node in
3 years if you're successful. Or, a machine with much less RAM and
flash-based storage. :)
	Seriously, though, the formula in the tuning guidelines is a
guideline. You can probably get acceptable performance with much less.
If not, you can shard your app such that you host a few Cfs per cluster.
I doubt you'll need to though.


	
	From: openvictor Open <op...@gmail.com>
	Reply-To: <us...@cassandra.apache.org>
	Date: Mon, 4 Apr 2011 18:24:25 -0400
	To: <us...@cassandra.apache.org>
	Subject: Re: Abnormal memory consumption
	

	Okay, I see. But isn't there a big issue for scaling here ? 
	Imagine that I am the developper of a certain very successful
website : At year 1 I need 20 CF. I might need to have 8Gb of RAM. Year
2 I need 50 CF because I added functionalities to my wonderful webiste
will I need 20 Gb of RAM ? And if at year three I had 300 Column
families, will I need 120 Gb of ram / node ? Or did I miss something
about memory consuption ?
	
	Thank you very much,
	
	Victor
	
	
	2011/4/4 Peter Schuller <pe...@infidyne.com>
	

		> And about the production 7Gb or RAM is sufficient ? Or
11 Gb is the minimum
		> ?
		> Thank you for your inputs for the JVM I'll try to tune
that
		
		
		Production mem reqs are mostly dependent on memtable
thresholds:
		
		  http://www.datastax.com/docs/0.7/operations/tuning
		
		If you enable key caching or row caching, you will have
to adjust
		accordingly as well.
		
		--
		/ Peter Schuller
		




Re: Abnormal memory consumption

Posted by openvictor Open <op...@gmail.com>.
Hello Paul,

Thank you for the tip. The random port attribution policy of JMX was really
making me mad ! Good to know there is a solution for that problem.

Concerning the rest of the conversation, my only concern is that as an
administrator and a student it is hard to constantly watch  Cassandra
instances so that they don't crash. As much as I love the principle of
Cassandra, being constantly afraid of memory consumption is an issue in my
opinion. That being said, I took a new 16 Gb server today, but I don't want
Cassandra to eat up everything if it is not needed, because Cassandra will
have some neighbors such as Tomcat, solR on this server.
And for me it is very weird that on my small instance where I put a lot of
constraints like throughput_memtableInMb to 6 Cassandra uses 600 Mb of ram
for 6 Mb of data. It seems to be a little bit of an overkill to me... And so
far I failed to find any information on what this massive overhead can be...

Thank you for your answers and for taking the time to answer my questions.

2011/4/6 Paul Choi <pa...@plaxo.com>

>  You can use JMX over ssh by doing this:
> http://blog.reactive.org/2011/02/connecting-to-cassandra-jmx-via-ssh.html
> Basically, you use SSH -D to do dynamic application port forwarding.
>
>  In terms of scaling, you'll be able to afford 120GB RAM/node in 3 years
> if you're successful. Or, a machine with much less RAM and flash-based
> storage. :)
> Seriously, though, the formula in the tuning guidelines is a guideline. You
> can probably get acceptable performance with much less. If not, you can
> shard your app such that you host a few Cfs per cluster. I doubt you'll need
> to though.
>
>
>   From: openvictor Open <op...@gmail.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Mon, 4 Apr 2011 18:24:25 -0400
> To: <us...@cassandra.apache.org>
> Subject: Re: Abnormal memory consumption
>
>  Okay, I see. But isn't there a big issue for scaling here ?
> Imagine that I am the developper of a certain very successful website : At
> year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF
> because I added functionalities to my wonderful webiste will I need 20 Gb of
> RAM ? And if at year three I had 300 Column families, will I need 120 Gb of
> ram / node ? Or did I miss something about memory consuption ?
>
> Thank you very much,
>
> Victor
>
> 2011/4/4 Peter Schuller <pe...@infidyne.com>
>
>> > And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the
>> minimum
>> > ?
>> > Thank you for your inputs for the JVM I'll try to tune that
>>
>>  Production mem reqs are mostly dependent on memtable thresholds:
>>
>>   http://www.datastax.com/docs/0.7/operations/tuning
>>
>> If you enable key caching or row caching, you will have to adjust
>> accordingly as well.
>>
>> --
>> / Peter Schuller
>>
>
>

Re: Abnormal memory consumption

Posted by Peter Schuller <pe...@infidyne.com>.
> Okay, I see. But isn't there a big issue for scaling here ?
> Imagine that I am the developper of a certain very successful website : At
> year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF
> because I added functionalities to my wonderful webiste will I need 20 Gb of
> RAM ? And if at year three I had 300 Column families, will I need 120 Gb of
> ram / node ? Or did I miss something about memory consuption ?

It's up to you to size the memtable thresholds appropriately. The
primary driver for memtable threshold size is the desire to avoid
future compaction work by making the flushed memtables larger. As
such, a larger memtable threshold is typically only particularly
relevant for column families that see a lot of writes.

So, if you have 50 column families out of which 2 are very frequently
written and the remainder only rarely, there will probably not be any
great motivation to have any significant memtable thresholds for the
remainder.

If you truly have a lot of column families, all of whom receive an
equal amount of traffic, then to some extent it's a scaling issue in
the sense that you'd be forced to use lower memtable thresholds for
each column family than you would otherwise, and the result of that is
additional compaction work (meaning, less sustainable write
throughput). But you won't be forced to have 120 gig nodes (a 120 gig
heap would be problematic for other reasons anyway).

-- 
/ Peter Schuller

Re: Abnormal memory consumption

Posted by Paul Choi <pa...@plaxo.com>.
You can use JMX over ssh by doing this:
http://blog.reactive.org/2011/02/connecting-to-cassandra-jmx-via-ssh.html
Basically, you use SSH -D to do dynamic application port forwarding.

In terms of scaling, you'll be able to afford 120GB RAM/node in 3 years if you're successful. Or, a machine with much less RAM and flash-based storage. :)
Seriously, though, the formula in the tuning guidelines is a guideline. You can probably get acceptable performance with much less. If not, you can shard your app such that you host a few Cfs per cluster. I doubt you'll need to though.


From: openvictor Open <op...@gmail.com>>
Reply-To: <us...@cassandra.apache.org>>
Date: Mon, 4 Apr 2011 18:24:25 -0400
To: <us...@cassandra.apache.org>>
Subject: Re: Abnormal memory consumption

Okay, I see. But isn't there a big issue for scaling here ?
Imagine that I am the developper of a certain very successful website : At year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF because I added functionalities to my wonderful webiste will I need 20 Gb of RAM ? And if at year three I had 300 Column families, will I need 120 Gb of ram / node ? Or did I miss something about memory consuption ?

Thank you very much,

Victor

2011/4/4 Peter Schuller <pe...@infidyne.com>>
> And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum
> ?
> Thank you for your inputs for the JVM I'll try to tune that

Production mem reqs are mostly dependent on memtable thresholds:

  http://www.datastax.com/docs/0.7/operations/tuning

If you enable key caching or row caching, you will have to adjust
accordingly as well.

--
/ Peter Schuller


Re: Abnormal memory consumption

Posted by openvictor Open <op...@gmail.com>.
Okay, I see. But isn't there a big issue for scaling here ?
Imagine that I am the developper of a certain very successful website : At
year 1 I need 20 CF. I might need to have 8Gb of RAM. Year 2 I need 50 CF
because I added functionalities to my wonderful webiste will I need 20 Gb of
RAM ? And if at year three I had 300 Column families, will I need 120 Gb of
ram / node ? Or did I miss something about memory consuption ?

Thank you very much,

Victor

2011/4/4 Peter Schuller <pe...@infidyne.com>

> > And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the
> minimum
> > ?
> > Thank you for your inputs for the JVM I'll try to tune that
>
> Production mem reqs are mostly dependent on memtable thresholds:
>
>   http://www.datastax.com/docs/0.7/operations/tuning
>
> If you enable key caching or row caching, you will have to adjust
> accordingly as well.
>
> --
> / Peter Schuller
>

Re: Abnormal memory consumption

Posted by Peter Schuller <pe...@infidyne.com>.
> And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum
> ?
> Thank you for your inputs for the JVM I'll try to tune that

Production mem reqs are mostly dependent on memtable thresholds:

   http://www.datastax.com/docs/0.7/operations/tuning

If you enable key caching or row caching, you will have to adjust
accordingly as well.

-- 
/ Peter Schuller

Re: Abnormal memory consumption

Posted by Victor Kabdebon <vi...@gmail.com>.
And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum
?
Thank you for your inputs for the JVM I'll try to tune that


2011/4/4 Peter Schuller <pe...@infidyne.com>

> > You can change VM settings and tweak things like memtable thresholds
> > and in-memory compaction limits to get it down and get away with a
> > smaller heap size, but honestly I don't recommend doing so unless
> > you're willing to spend some time getting that right and probably
> > repeating some of the work in the future with future versions of
> > Cassandra.
>
> That said, if you do want to do so to give it a try, I suggest (1)
> changing cassandra-env to remove all the GC stuff:
>
> VM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> And then setting a fixed heap size, and removing the manual fixation of new
> gen:
>
> JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"
>
> Then maybe remove the initial heap size enforcement, but that might
> not help depending:
>
> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>
> And then go through cassandra.yaml and tune down all the various
> limitations. Less concurrent readers/writers, all the *_mb_* settings
> way down, and the RPC framing limitations.
>
> But let me re-iterate: I don't recommend running in any such
> configuration in production. But if you just want it running for
> testing/for just being available, with no special requirements, and
> not in production, the above might work. I haven't really tested it
> myself; there may be gotchas involved.
>
> --
> / Peter Schuller
>

Re: Abnormal memory consumption

Posted by Peter Schuller <pe...@infidyne.com>.
> You can change VM settings and tweak things like memtable thresholds
> and in-memory compaction limits to get it down and get away with a
> smaller heap size, but honestly I don't recommend doing so unless
> you're willing to spend some time getting that right and probably
> repeating some of the work in the future with future versions of
> Cassandra.

That said, if you do want to do so to give it a try, I suggest (1)
changing cassandra-env to remove all the GC stuff:

VM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

And then setting a fixed heap size, and removing the manual fixation of new gen:

JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"

Then maybe remove the initial heap size enforcement, but that might
not help depending:

JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"

And then go through cassandra.yaml and tune down all the various
limitations. Less concurrent readers/writers, all the *_mb_* settings
way down, and the RPC framing limitations.

But let me re-iterate: I don't recommend running in any such
configuration in production. But if you just want it running for
testing/for just being available, with no special requirements, and
not in production, the above might work. I haven't really tested it
myself; there may be gotchas involved.

-- 
/ Peter Schuller

Re: Abnormal memory consumption

Posted by Peter Schuller <pe...@infidyne.com>.
> My last concern and for me it is a flaw for Cassandra and I am sad to admit
> it because I love cassandra : how come that for 6Mb of data, Cassandra feels
> the need to fill 500 Mb of RAM ? I can understand the need for, let's say,
> 100 Mo because of cache and several Memtable being alive at the same time.
> But 500 Mb of ram is 80 time the total amount of data I have. redis that you
> mentionned uses 50 Mb.

It's just the way the JVM works, particularly when using the CMS gc.
For efficiency reasons it'll tend to use up to your maximum heap size.
In the case of default cassandra options, it's even specified that the
initial heap size is equal to the maxium.

It's not that it needs that much memory for 60 mb of data; it's that
the way Cassandra is configured and run by default, in combination
with JVM behavior, means that you'll end up eating a significant
amount of data for a node. Increasing your 60 mb to 120 mb doesn't
double the amount of memory you need for your node.

You can change VM settings and tweak things like memtable thresholds
and in-memory compaction limits to get it down and get away with a
smaller heap size, but honestly I don't recommend doing so unless
you're willing to spend some time getting that right and probably
repeating some of the work in the future with future versions of
Cassandra.

I think the bottom line is that Cassandra isn't really primarily
intended, out-of-the-box to run as one out of N database servers on a
single machine. and no effort is put into trying to make Cassandra
very useful for very very small heap sizes.

-- 
/ Peter Schuller

Re: Abnormal memory consumption

Posted by openvictor Open <op...@gmail.com>.
Hey Aaron,

Thank you for your kind answer.
This is a test server, the production serveur (single instance at the
moment) has 8 Gb (or 12 Go not decided yet) of RAM. But with it there are
other things running such as :

Solr, Redis, PostGreSQL, Tomcat. The total take up to 1 Gb of RAM when
running and loaded.  I do a personal open source project and I am a student
so I don't have a lot of money, but to be clear Cassandra is used as a
"safe" where I keep all the information. These informations are then
distributed to Redis, PostGreSQL and SolR so they can be exploited then
redistributed to the users of the website. My concern is : is Cassandra
going to be able to live in 7 Go of RAM ? Or should I go for 12 (then 11 Go)
?

My last concern and for me it is a flaw for Cassandra and I am sad to admit
it because I love cassandra : how come that for 6Mb of data, Cassandra feels
the need to fill 500 Mb of RAM ? I can understand the need for, let's say,
100 Mo because of cache and several Memtable being alive at the same time.
But 500 Mb of ram is 80 time the total amount of data I have. redis that you
mentionned uses 50 Mb.


Victor.

2011/4/4 aaron morton <aa...@thelastpickle.com>

> For background see the JVM Heap Size section here
> http://wiki.apache.org/cassandra/MemtableThresholds
>
> You can also add a fudge factor of anywhere from X2 to X8 to the size of
> the memtables. You are in for a very difficult time trying to run cassandra
> with under 500MB of heap space.
>
> Is this just a test or are you hoping to run it in production like this ?
> If you need a small single instance schema free data store would redis suit
> your needs ?
>
> Hope that helps.
> Aaron
>
> On 2 Apr 2011, at 01:34, openvictor Open wrote:
>
> > Hello everybody,
> >
> > I am quite new to Cassandra and I am worried about an apache cassandra
> server that is running on an small isolated server with only 2 Gb of RAM. On
> this server there is very little data in Cassandra (  ~3 Mb only text in
> column values) but there are other servers such as : SolR, Tomcat, Redis,
> PostGreSQL. There is quite a lot of column families (about 15) but some
> column families are empty at the moment. At the moment memory consumption is
> 484 Mb real and 948556 in virtual.
> >
> > I modified the storage-conf ( I am running apache cassandra 0.6.11) I set
> DiskAccessMode in standard since I am running on debian 64 bits. I also set
> the MemtableThroughput to 16 Mb instead of 64 Mb and I lower the Xms value
> to and Xmx to 128M and 256M.
> >
> > My question is : where does this giant memory overhead comes from (484 Mb
> for 3 Mb of data seems insane) ? And more importantly : how can I set
> Cassandra to use maximum let's say 500 Mb, because at this rate Cassandra
> will be well over that limit soon.
> > For information because of security I cannot use JMX, except if there is
> a way to use JMX without an interface through SSH.
> >
> > Thank you for your help.
> > Victor
>
>

Re: Abnormal memory consumption

Posted by aaron morton <aa...@thelastpickle.com>.
For background see the JVM Heap Size section here 
http://wiki.apache.org/cassandra/MemtableThresholds

You can also add a fudge factor of anywhere from X2 to X8 to the size of the memtables. You are in for a very difficult time trying to run cassandra with under 500MB of heap space. 

Is this just a test or are you hoping to run it in production like this ? If you need a small single instance schema free data store would redis suit your needs ?

Hope that helps.
Aaron

On 2 Apr 2011, at 01:34, openvictor Open wrote:

> Hello everybody,
> 
> I am quite new to Cassandra and I am worried about an apache cassandra server that is running on an small isolated server with only 2 Gb of RAM. On this server there is very little data in Cassandra (  ~3 Mb only text in column values) but there are other servers such as : SolR, Tomcat, Redis, PostGreSQL. There is quite a lot of column families (about 15) but some column families are empty at the moment. At the moment memory consumption is 484 Mb real and 948556 in virtual.
> 
> I modified the storage-conf ( I am running apache cassandra 0.6.11) I set DiskAccessMode in standard since I am running on debian 64 bits. I also set the MemtableThroughput to 16 Mb instead of 64 Mb and I lower the Xms value to and Xmx to 128M and 256M.
> 
> My question is : where does this giant memory overhead comes from (484 Mb for 3 Mb of data seems insane) ? And more importantly : how can I set Cassandra to use maximum let's say 500 Mb, because at this rate Cassandra will be well over that limit soon.
> For information because of security I cannot use JMX, except if there is a way to use JMX without an interface through SSH.
> 
> Thank you for your help.
> Victor