You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vasileios Vlachos <va...@gmail.com> on 2012/04/10 15:54:01 UTC

Cassandra running out of memory?

Hello,

We are experimenting a bit with Cassandra lately (version 1.0.7) and we
seem to have some problems with memory. We use EC2 as our test environment
and we have three nodes with 3.7G of memory and 1 core @ 2.4G, all running
Ubuntu server 11.10.

The problem is that the node we hit from our thrift interface dies
regularly (approximately after we store 2-2.5G of data). Error message:
OutOfMemoryError: Java Heap Space and according to the log it in fact used
all of the allocated memory.

The nodes are under relatively constant load and store about 2000-4000 row
keys a minute, which are batched through the Trift interface in 10-30 row
keys at once (with about 50 columns each). The number of reads is very low
with around 1000-2000 a day and only requesting the data of a single row
key. The is currently only one used column family.

The initial thought was that something was wrong in the cassandra-env.sh
file. So, we specified the variables 'system_memory_in_mb' (3760) and the
'system_cpu_cores' (1) according to our nodes' specification. We also
changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
the second is related to the Garbage Collection). Unfortunately, that did
not solve the issue and the node we hit via thrift keeps on dying regularly.

In case you find this useful, swap is off and unevictable memory seems to
be very high on all 3 servers (2.3GB, we usually observe the amount of
unevictable memory on other Linux servers of around 0-16KB) (We are not
quite sure how the unevictable memory ties into Cassandra, its just
something we observed while looking into the problem). The CPU is pretty
much idle the entire time. The heap memory is clearly being reduced once in
a while according to nodetool, but obviously grows over the limit as time
goes by.

Any ideas? Thanks in advance.

Bill

Re: Cassandra running out of memory?

Posted by Vasileios Vlachos <va...@gmail.com>.
Thank you Aaron. 8G memory is about the spec we use now for testing.

I observed a couple of other things when checked the output.log file but 
I think this should go to another post.

Thank you very much for your advice.

Bill


On 13/04/12 02:49, aaron morton wrote:
> It depends on a lot of things: schema size, caches, work load etc.
>
> If your are just starting out I would recommend using a machine with 
> 8gb or 16gb total ram. By default cassandra will take about 4gb or 8gb 
> (respectively) for the JVM.
>
> Once you have a feel for how things work you should be able to 
> estimate the resources your application will need.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/04/2012, at 2:19 AM, Vasileios Vlachos wrote:
>
>> Hello Aaron,
>>
>> Thank you for getting back to me.
>>
>> I will change to m1.large first to see how long it will take 
>> Cassandra node to die (if at all). If again not happy I will try more 
>> memory. I just want to test it step by step and see what the 
>> differences are. I will also change the cassandra-env file back to 
>> defaults.
>>
>> Is there an absolute minimum requirement for Cassandra in terms of 
>> memory? I might be wrong, but from my understanding we shouldn't have 
>> any problems given the amount of data we store per day (currently 
>> approximately 2-2.5G / day).
>>
>> Thank you in advance,
>>
>> Bill
>>
>>
>> On Wed, Apr 11, 2012 at 7:33 PM, aaron morton 
>> <aaron@thelastpickle.com <ma...@thelastpickle.com>> wrote:
>>
>>>     'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1)
>>>     according to our nodes' specification. We also changed the
>>>     'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
>>>     the second is related to the Garbage Collection).
>>     It's best to leave the default settings unless you know what you
>>     are doing here.
>>
>>>     In case you find this useful, swap is off and unevictable memory
>>>     seems to be very high on all 3 servers (2.3GB, we usually
>>>     observe the amount of unevictable memory on other Linux servers
>>>     of around 0-16KB)
>>     Cassandra locks the java memory so it cannot be swapped out.
>>
>>>     The problem is that the node we hit from our thrift interface
>>>     dies regularly (approximately after we store 2-2.5G of data).
>>>     Error message: OutOfMemoryError: Java Heap Space and according
>>>     to the log it in fact used all of the allocated memory.
>>     The easiest solution will be to use a larger EC2 instance.
>>
>>     People normally use an m1.xlarge with 16Gb of ram (you would also
>>     try an m1.large).
>>
>>     If you are still experimenting I would suggest using the larger
>>     instances so you can make some progress. Once you have a feel for
>>     how things work you can then try to match the instances to your
>>     budget.
>>
>>     Hope that helps.
>>
>>     -----------------
>>     Aaron Morton
>>     Freelance Developer
>>     @aaronmorton
>>     http://www.thelastpickle.com <http://www.thelastpickle.com/>
>>
>>     On 11/04/2012, at 1:54 AM, Vasileios Vlachos wrote:
>>
>>>     Hello,
>>>
>>>     We are experimenting a bit with Cassandra lately (version 1.0.7)
>>>     and we seem to have some problems with memory. We use EC2 as our
>>>     test environment and we have three nodes with 3.7G of memory and
>>>     1 core @ 2.4G, all running Ubuntu server 11.10.
>>>
>>>     The problem is that the node we hit from our thrift interface
>>>     dies regularly (approximately after we store 2-2.5G of data).
>>>     Error message: OutOfMemoryError: Java Heap Space and according
>>>     to the log it in fact used all of the allocated memory.
>>>
>>>     The nodes are under relatively constant load and store about
>>>     2000-4000 row keys a minute, which are batched through the Trift
>>>     interface in 10-30 row keys at once (with about 50 columns
>>>     each). The number of reads is very low with around 1000-2000 a
>>>     day and only requesting the data of a single row key. The is
>>>     currently only one used column family.
>>>
>>>     The initial thought was that something was wrong in the
>>>     cassandra-env.sh file. So, we specified the variables
>>>     'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1)
>>>     according to our nodes' specification. We also changed the
>>>     'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
>>>     the second is related to the Garbage Collection). Unfortunately,
>>>     that did not solve the issue and the node we hit via thrift
>>>     keeps on dying regularly.
>>>
>>>     In case you find this useful, swap is off and unevictable memory
>>>     seems to be very high on all 3 servers (2.3GB, we usually
>>>     observe the amount of unevictable memory on other Linux servers
>>>     of around 0-16KB) (We are not quite sure how the unevictable
>>>     memory ties into Cassandra, its just something we observed while
>>>     looking into the problem). The CPU is pretty much idle the
>>>     entire time. The heap memory is clearly being reduced once in a
>>>     while according to nodetool, but obviously grows over the limit
>>>     as time goes by.
>>>
>>>     Any ideas? Thanks in advance.
>>>
>>>     Bill
>>
>>
>


-- 

Kind regards,

Vasileios Vlachos


Re: Cassandra running out of memory?

Posted by aaron morton <aa...@thelastpickle.com>.
It depends on a lot of things: schema size, caches, work load etc. 

If your are just starting out I would recommend using a machine with 8gb or 16gb total ram. By default cassandra will take about 4gb or 8gb (respectively) for the JVM.

Once you have a feel for how things work you should be able to estimate the resources your application will need. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/04/2012, at 2:19 AM, Vasileios Vlachos wrote:

> Hello Aaron,
> 
> Thank you for getting back to me.
> 
> I will change to m1.large first to see how long it will take Cassandra node to die (if at all). If again not happy I will try more memory. I just want to test it step by step and see what the differences are. I will also change the cassandra-env file back to defaults.
> 
> Is there an absolute minimum requirement for Cassandra in terms of memory? I might be wrong, but from my understanding we shouldn't have any problems given the amount of data we store per day (currently approximately 2-2.5G / day).
> 
> Thank you in advance,
> 
> Bill
> 
> 
> On Wed, Apr 11, 2012 at 7:33 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage Collection). 
> It's best to leave the default settings unless you know what you are doing here. 
> 
>> In case you find this useful, swap is off and unevictable memory seems to be very high on all 3 servers (2.3GB, we usually observe the amount of unevictable memory on other Linux servers of around 0-16KB)
> Cassandra locks the java memory so it cannot be swapped out. 
> 
>> The problem is that the node we hit from our thrift interface dies regularly (approximately after we store 2-2.5G of data). Error message: OutOfMemoryError: Java Heap Space and according to the log it in fact used all of the allocated memory.
> The easiest solution will be to use a larger EC2 instance. 
> 
> People normally use an m1.xlarge with 16Gb of ram (you would also try an m1.large).
> 
> If you are still experimenting I would suggest using the larger instances so you can make some progress. Once you have a feel for how things work you can then try to match the instances to your budget.
> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 11/04/2012, at 1:54 AM, Vasileios Vlachos wrote:
> 
>> Hello,
>> 
>> We are experimenting a bit with Cassandra lately (version 1.0.7) and we seem to have some problems with memory. We use EC2 as our test environment and we have three nodes with 3.7G of memory and 1 core @ 2.4G, all running Ubuntu server 11.10. 
>> 
>> The problem is that the node we hit from our thrift interface dies regularly (approximately after we store 2-2.5G of data). Error message: OutOfMemoryError: Java Heap Space and according to the log it in fact used all of the allocated memory.
>> 
>> The nodes are under relatively constant load and store about 2000-4000 row keys a minute, which are batched through the Trift interface in 10-30 row keys at once (with about 50 columns each). The number of reads is very low with around 1000-2000 a day and only requesting the data of a single row key. The is currently only one used column family.
>> 
>> The initial thought was that something was wrong in the cassandra-env.sh file. So, we specified the variables 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage Collection). Unfortunately, that did not solve the issue and the node we hit via thrift keeps on dying regularly.
>> 
>> In case you find this useful, swap is off and unevictable memory seems to be very high on all 3 servers (2.3GB, we usually observe the amount of unevictable memory on other Linux servers of around 0-16KB) (We are not quite sure how the unevictable memory ties into Cassandra, its just something we observed while looking into the problem). The CPU is pretty much idle the entire time. The heap memory is clearly being reduced once in a while according to nodetool, but obviously grows over the limit as time goes by.
>> 
>> Any ideas? Thanks in advance.
>> 
>> Bill
> 
> 


Re: Cassandra running out of memory?

Posted by Vasileios Vlachos <va...@gmail.com>.
Hello Aaron,

Thank you for getting back to me.

I will change to m1.large first to see how long it will take Cassandra node
to die (if at all). If again not happy I will try more memory. I just want
to test it step by step and see what the differences are. I will also
change the cassandra-env file back to defaults.

Is there an absolute minimum requirement for Cassandra in terms of memory?
I might be wrong, but from my understanding we shouldn't have any problems
given the amount of data we store per day (currently approximately 2-2.5G /
day).

Thank you in advance,

Bill


On Wed, Apr 11, 2012 at 7:33 PM, aaron morton <aa...@thelastpickle.com>wrote:

> 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to
> our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the
> 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage
> Collection).
>
> It's best to leave the default settings unless you know what you are doing
> here.
>
> In case you find this useful, swap is off and unevictable memory seems to
> be very high on all 3 servers (2.3GB, we usually observe the amount of
> unevictable memory on other Linux servers of around 0-16KB)
>
> Cassandra locks the java memory so it cannot be swapped out.
>
> The problem is that the node we hit from our thrift interface dies
> regularly (approximately after we store 2-2.5G of data). Error message:
> OutOfMemoryError: Java Heap Space and according to the log it in fact used
> all of the allocated memory.
>
> The easiest solution will be to use a larger EC2 instance.
>
> People normally use an m1.xlarge with 16Gb of ram (you would also try an
> m1.large).
>
> If you are still experimenting I would suggest using the larger instances
> so you can make some progress. Once you have a feel for how things work you
> can then try to match the instances to your budget.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/04/2012, at 1:54 AM, Vasileios Vlachos wrote:
>
> Hello,
>
> We are experimenting a bit with Cassandra lately (version 1.0.7) and we
> seem to have some problems with memory. We use EC2 as our test environment
> and we have three nodes with 3.7G of memory and 1 core @ 2.4G, all running
> Ubuntu server 11.10.
>
> The problem is that the node we hit from our thrift interface dies
> regularly (approximately after we store 2-2.5G of data). Error message:
> OutOfMemoryError: Java Heap Space and according to the log it in fact used
> all of the allocated memory.
>
> The nodes are under relatively constant load and store about 2000-4000 row
> keys a minute, which are batched through the Trift interface in 10-30 row
> keys at once (with about 50 columns each). The number of reads is very low
> with around 1000-2000 a day and only requesting the data of a single row
> key. The is currently only one used column family.
>
> The initial thought was that something was wrong in the cassandra-env.sh
> file. So, we specified the variables 'system_memory_in_mb' (3760) and the
> 'system_cpu_cores' (1) according to our nodes' specification. We also
> changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
> the second is related to the Garbage Collection). Unfortunately, that did
> not solve the issue and the node we hit via thrift keeps on dying regularly.
>
> In case you find this useful, swap is off and unevictable memory seems to
> be very high on all 3 servers (2.3GB, we usually observe the amount of
> unevictable memory on other Linux servers of around 0-16KB) (We are not
> quite sure how the unevictable memory ties into Cassandra, its just
> something we observed while looking into the problem). The CPU is pretty
> much idle the entire time. The heap memory is clearly being reduced once in
> a while according to nodetool, but obviously grows over the limit as time
> goes by.
>
> Any ideas? Thanks in advance.
>
> Bill
>
>
>

Re: Cassandra running out of memory?

Posted by aaron morton <aa...@thelastpickle.com>.
> 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage Collection). 
It's best to leave the default settings unless you know what you are doing here. 

> In case you find this useful, swap is off and unevictable memory seems to be very high on all 3 servers (2.3GB, we usually observe the amount of unevictable memory on other Linux servers of around 0-16KB)
Cassandra locks the java memory so it cannot be swapped out. 

> The problem is that the node we hit from our thrift interface dies regularly (approximately after we store 2-2.5G of data). Error message: OutOfMemoryError: Java Heap Space and according to the log it in fact used all of the allocated memory.
The easiest solution will be to use a larger EC2 instance. 

People normally use an m1.xlarge with 16Gb of ram (you would also try an m1.large).

If you are still experimenting I would suggest using the larger instances so you can make some progress. Once you have a feel for how things work you can then try to match the instances to your budget.

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/04/2012, at 1:54 AM, Vasileios Vlachos wrote:

> Hello,
> 
> We are experimenting a bit with Cassandra lately (version 1.0.7) and we seem to have some problems with memory. We use EC2 as our test environment and we have three nodes with 3.7G of memory and 1 core @ 2.4G, all running Ubuntu server 11.10. 
> 
> The problem is that the node we hit from our thrift interface dies regularly (approximately after we store 2-2.5G of data). Error message: OutOfMemoryError: Java Heap Space and according to the log it in fact used all of the allocated memory.
> 
> The nodes are under relatively constant load and store about 2000-4000 row keys a minute, which are batched through the Trift interface in 10-30 row keys at once (with about 50 columns each). The number of reads is very low with around 1000-2000 a day and only requesting the data of a single row key. The is currently only one used column family.
> 
> The initial thought was that something was wrong in the cassandra-env.sh file. So, we specified the variables 'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1) according to our nodes' specification. We also changed the 'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think the second is related to the Garbage Collection). Unfortunately, that did not solve the issue and the node we hit via thrift keeps on dying regularly.
> 
> In case you find this useful, swap is off and unevictable memory seems to be very high on all 3 servers (2.3GB, we usually observe the amount of unevictable memory on other Linux servers of around 0-16KB) (We are not quite sure how the unevictable memory ties into Cassandra, its just something we observed while looking into the problem). The CPU is pretty much idle the entire time. The heap memory is clearly being reduced once in a while according to nodetool, but obviously grows over the limit as time goes by.
> 
> Any ideas? Thanks in advance.
> 
> Bill