You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Peter Schuller <pe...@infidyne.com> on 2010/07/20 21:09:15 UTC

Re: Ran into an issue where Cassandra Crashed when running out of heap space

> CassandraDaemon.java (line 83) Uncaught exception in thread
> Thread[pool-1-thread-37895,5,main]
> java.lang.OutOfMemoryError: Java heap space
>         at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296)
>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:203)
>         at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1116)
>         at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)

Did someone send garbage on the wrong port, causing thrift to try to
read some huge string in the RPC layer? There is a bug filed about
this upstream with thrift but I couldn't find it now.

> Is there a problem with Garbage Collection? Should I restart my
> servers every few days?

No. The CMS collector will be subject to some delayed slow growth as a
result of fragmentation in old space (just like malloc would), but in
general you should expect to see memory use stabilize. Likely, you're
either causing an out of memory condition by some kind of explosive
memory use (such as the garbage-on-thrift-port, or some humongous
mutation request, etc) or you are legitimately using too much memory
in which case you may look into adjusting cache sizes and memtable
flushing thresholds.

If your version of cassandra logs GC:s (I'm not sure if 0.6.x does
it), legitimate heap growth should be obvious from GC messages in the
cassandra system log. You can also run with -XX:+PrintGC and
-XX:+PrintGCDetails to get GC logs from the JVM on stdout, and with
-Xloggc:path/to/log to redirect said GC output to file.

You may want to use something like VisualVM or JConsole to attach to
cassandra on monitor memory usage if you prefer that to looking at the
log output.

-- 
/ Peter Schuller

Re: Ran into an issue where Cassandra Crashed when running out of heap space

Posted by Ryan King <ry...@twitter.com>.
On Tue, Jul 20, 2010 at 1:28 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Attaching Jconsole shows that there is a growth of memory and weird
>> spikes. Unfortunately I did not take a screen shot of the growth of
>> the spike over time. I'll do that when it occurs again.
>
> Note that expected behavior for CMS is to have lots of small ups and
> downs as a result of young generation GC:s, and a longer period cycle
> of larger ups and downs that correspond to CMS kicking in with a
> concurrent mark/sweep phase.
>
> To get a reasonable estimate of the actual live set, you would
> normally look at the free memory at the end of one of the larger dips
> after a CMS mark/sweep phase.

You'll also see memory sawtoothing for memtable growth+flush cycles.

-ryan

Re: Ran into an issue where Cassandra Crashed when running out of heap space

Posted by Peter Schuller <pe...@infidyne.com>.
> Attaching Jconsole shows that there is a growth of memory and weird
> spikes. Unfortunately I did not take a screen shot of the growth of
> the spike over time. I'll do that when it occurs again.

Note that expected behavior for CMS is to have lots of small ups and
downs as a result of young generation GC:s, and a longer period cycle
of larger ups and downs that correspond to CMS kicking in with a
concurrent mark/sweep phase.

To get a reasonable estimate of the actual live set, you would
normally look at the free memory at the end of one of the larger dips
after a CMS mark/sweep phase.

-- 
/ Peter Schuller

Re: Ran into an issue where Cassandra Crashed when running out of heap space

Posted by Dathan Pattishall <da...@gmail.com>.
The storage structure is rather simple.

For every 1 key there is 1 column and a timestamp for that column.

<ColumnFamily Name="Standard2" CompareWith="UTF8Type" />


We don't enable pulling a huge amount of data and all other nodes are
up servicing the same request. I suspect there may be another problem
with Memory management inside Cassandra.

Attaching Jconsole shows that there is a growth of memory and weird
spikes. Unfortunately I did not take a screen shot of the growth of
the spike over time. I'll do that when it occurs again.






On Tue, Jul 20, 2010 at 1:05 PM, Tristan Seligmann
<mi...@mithrandi.net> wrote:
> On Tue, Jul 20, 2010 at 9:09 PM, Peter Schuller
> <pe...@infidyne.com> wrote:
>>> CassandraDaemon.java (line 83) Uncaught exception in thread
>>> Thread[pool-1-thread-37895,5,main]
>>> java.lang.OutOfMemoryError: Java heap space
>>>         at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296)
>>>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:203)
>>>         at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1116)
>>>         at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>         at java.lang.Thread.run(Thread.java:619)
>>
>> Did someone send garbage on the wrong port, causing thrift to try to
>> read some huge string in the RPC layer? There is a bug filed about
>> this upstream with thrift but I couldn't find it now.
>
> In particular, I've seen this happen when using the wrong protocol
> (framed / unframed) on the client relative to what the server is
> configured for.
> --
> mithrandi, i Ainil en-Balandor, a faer Ambar
>

Re: Ran into an issue where Cassandra Crashed when running out of heap space

Posted by Tristan Seligmann <mi...@mithrandi.net>.
On Tue, Jul 20, 2010 at 9:09 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>> CassandraDaemon.java (line 83) Uncaught exception in thread
>> Thread[pool-1-thread-37895,5,main]
>> java.lang.OutOfMemoryError: Java heap space
>>         at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:296)
>>         at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:203)
>>         at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1116)
>>         at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:619)
>
> Did someone send garbage on the wrong port, causing thrift to try to
> read some huge string in the RPC layer? There is a bug filed about
> this upstream with thrift but I couldn't find it now.

In particular, I've seen this happen when using the wrong protocol
(framed / unframed) on the client relative to what the server is
configured for.
-- 
mithrandi, i Ainil en-Balandor, a faer Ambar