You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Narendra Sharma <na...@gmail.com> on 2014/01/02 17:13:56 UTC

Cassandra 1.1.6 crash without any exception or error in log

8 node cluster running in aws. Any pointers where I should start looking?
No kill -9 in history.

Re: Cassandra 1.1.6 crash without any exception or error in log

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Jan 2, 2014 at 10:03 PM, Nitin Sharma
<ni...@bloomreach.com>wrote:

> I would recommend always running cassandra with
>  -XX:+HeapDumpOnOutofMemoryError. This dumps out  a *.hprof file if the
> process dies due to OOM
>

If you do this, be sure to configure the heap dump directory, or ensure
that the running directory of cassandra (often /home/cassandra) is on a
partition with enough free space to hold a dump the size of your heap.

=Rob

Re: Cassandra 1.1.6 crash without any exception or error in log

Posted by Narendra Sharma <na...@gmail.com>.

In this case the Java/Cassandra process never ran out of memory. Rather it
had 20% heap free. It is the OS that ran out of memory. This is the side
effect of running with large heap. I was aware of the Java's inefficiency
wrt large heap but had to keep it due to large bloomfilter. Note we are
still on 1.1.x.




On Thu, Jan 2, 2014 at 10:03 PM, Nitin Sharma
<ni...@bloomreach.com>wrote:

> I would recommend always running cassandra with
>  -XX:+HeapDumpOnOutofMemoryError. This dumps out  a *.hprof file if the
> process dies due to OOM
>
> You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse
> MAT <http://www.eclipse.org/mat>) to figure out root causes and potential
> leaks
>
> Hope this helps
> -- Nitin
>
>
> On Thu, Jan 2, 2014 at 9:00 PM, Narendra Sharma <narendra.sharma@gmail.com
> > wrote:
>
>> The root cause turned out to be high heap. The Linux OOM Killer (
>> http://linux-mm.org/OOM_Killer) killed the process. It took some time to
>> figure out but very interesting. We knew high heap is a problem but had no
>> clue when the actual heap usage was well within limit and the process
>> disappeared. syslog helped figure this out.
>>
>> About Linux OOM Killer
>> "It is the job of the linux 'oom killer' to *sacrifice* one or more
>> processes in order to free up memory for the system when all else fails"
>>
>>
>> On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli <rc...@eventbrite.com>wrote:
>>
>>> On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma <
>>> narendra.sharma@gmail.com> wrote:
>>>
>>>> 8 node cluster running in aws. Any pointers where I should start
>>>> looking?
>>>> No kill -9 in history.
>>>>
>>> You should start looking at instructions as to how to upgrade to at
>>> least the top of the 1.1 line... :D
>>>
>>> =Rob
>>>
>>
>>
>>
>> --
>> Narendra Sharma
>> Software Engineer
>> *http://www.aeris.com <http://www.aeris.com>*
>> *http://narendrasharma.blogspot.com/
>> <http://narendrasharma.blogspot.com/>*
>>
>>
>
>
> --
> -- Nitin
>



-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.aeris.com>*
*http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*

Re: Cassandra 1.1.6 crash without any exception or error in log

Posted by Nitin Sharma <ni...@bloomreach.com>.

I would recommend always running cassandra with
 -XX:+HeapDumpOnOutofMemoryError. This dumps out  a *.hprof file if the
process dies due to OOM

You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse
MAT <http://www.eclipse.org/mat>) to figure out root causes and potential
leaks

Hope this helps
-- Nitin


On Thu, Jan 2, 2014 at 9:00 PM, Narendra Sharma
<na...@gmail.com>wrote:

> The root cause turned out to be high heap. The Linux OOM Killer (
> http://linux-mm.org/OOM_Killer) killed the process. It took some time to
> figure out but very interesting. We knew high heap is a problem but had no
> clue when the actual heap usage was well within limit and the process
> disappeared. syslog helped figure this out.
>
> About Linux OOM Killer
> "It is the job of the linux 'oom killer' to *sacrifice* one or more
> processes in order to free up memory for the system when all else fails"
>
>
> On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma <
>> narendra.sharma@gmail.com> wrote:
>>
>>> 8 node cluster running in aws. Any pointers where I should start looking?
>>> No kill -9 in history.
>>>
>> You should start looking at instructions as to how to upgrade to at least
>> the top of the 1.1 line... :D
>>
>> =Rob
>>
>
>
>
> --
> Narendra Sharma
> Software Engineer
> *http://www.aeris.com <http://www.aeris.com>*
> *http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*
>
>


-- 
-- Nitin

Re: Cassandra 1.1.6 crash without any exception or error in log

Posted by Narendra Sharma <na...@gmail.com>.

The root cause turned out to be high heap. The Linux OOM Killer (
http://linux-mm.org/OOM_Killer) killed the process. It took some time to
figure out but very interesting. We knew high heap is a problem but had no
clue when the actual heap usage was well within limit and the process
disappeared. syslog helped figure this out.

About Linux OOM Killer
"It is the job of the linux 'oom killer' to *sacrifice* one or more
processes in order to free up memory for the system when all else fails"

On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma <narendra.sharma@gmail.com
> > wrote:
>
>> 8 node cluster running in aws. Any pointers where I should start looking?
>> No kill -9 in history.
>>
> You should start looking at instructions as to how to upgrade to at least
> the top of the 1.1 line... :D
>
> =Rob
>

-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com <http://www.aeris.com>*
*http://narendrasharma.blogspot.com/ <http://narendrasharma.blogspot.com/>*

Re: Cassandra 1.1.6 crash without any exception or error in log

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma
<na...@gmail.com>wrote:

> 8 node cluster running in aws. Any pointers where I should start looking?
> No kill -9 in history.
>
You should start looking at instructions as to how to upgrade to at least
the top of the 1.1 line... :D

=Rob