You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ивaн Cобoлeв <so...@gmail.com> on 2012/11/17 07:07:58 UTC

Cassandra nodes failing with OOM

Dear Community,

advice from you needed.

We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM
message).
Nodes died in groups of 3, 1, 2. No adjacent died, though we use
SimpleSnitch.

Version:         1.1.6
Hardware:      12Gb RAM / 8 cores(virtual)
Data:              40Gb/node
Nodes:           36 nodes

Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)
CFs:                36, 2 indexes
Partitioner:      Random
Compaction:   Leveled(we don't want 2x space for housekeeping)
Caching:          Keys only

All is pretty much standard apart from the one CF receiving writes in 64K
chunks and having sstable_size_in_mb=100.
No JNA installed - this is to be fixed soon.

Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the
only change - network activity spiking.
All the nodes before dying had the following on logs:
> INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line
72) MemtablePostFlusher               1         4         0
> INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line
72) FlushWriter                       1         3         0
> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
72) HintedHandoff                     1         6         0
> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
77) CompactionManager                 5         9

GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in
5-10mins.

So, could you please give me a hint on:
1. How much GCInspector warnings per hour are considered 'normal'?
2. What should be the next thing to check?
3. What are the possible failure reasons and how to prevent those?

Thank you very much in advance,
Ivan

Re: Cassandra nodes failing with OOM

Posted by Ивaн Cобoлeв <so...@gmail.com>.
Hi, all,

thank you very much for the help. Aaron was right - we had a
multiget_count query,
which depending on the app input would result in a calculation performed
for ~40k keys.

We've released the fix and ~100 GCInspector warnings per day per node went
to ~1 per day per 30 nodes :)

Thank you very much!

Ivan

2012/11/19 Viktor Jevdokimov <Vi...@adform.com>

>  We've seen OOM in a situation, when OS was not properly prepared in
> production.****
>
> ** **
>
> http://www.datastax.com/docs/1.1/install/recommended_settings****
>
> ** **
>
> ** **
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* some.unique.login@gmail.com [mailto:some.unique.login@gmail.com]
> *On Behalf Of *Ивaн Cобoлeв
> *Sent:* Saturday, November 17, 2012 08:08
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra nodes failing with OOM****
>
> ** **
>
> Dear Community, ****
>
> ** **
>
> advice from you needed. ****
>
> ** **
>
> We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM
> message). ****
>
> Nodes died in groups of 3, 1, 2. No adjacent died, though we use
> SimpleSnitch.****
>
> ** **
>
> Version:         1.1.6****
>
> Hardware:      12Gb RAM / 8 cores(virtual)****
>
> Data:              40Gb/node****
>
> Nodes:           36 nodes****
>
> ** **
>
> Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)****
>
> CFs:                36, 2 indexes****
>
> Partitioner:      Random****
>
> Compaction:   Leveled(we don't want 2x space for housekeeping)****
>
> Caching:          Keys only****
>
> ** **
>
> All is pretty much standard apart from the one CF receiving writes in 64K
> chunks and having sstable_size_in_mb=100.****
>
> No JNA installed - this is to be fixed soon.****
>
> ** **
>
> Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the
> only change - network activity spiking. ****
>
> All the nodes before dying had the following on logs:****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line
> 72) MemtablePostFlusher               1         4         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line
> 72) FlushWriter                       1         3         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
> 72) HintedHandoff                     1         6         0****
>
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line
> 77) CompactionManager                 5         9****
>
> ** **
>
> GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in
> 5-10mins.****
>
> ** **
>
> So, could you please give me a hint on:****
>
> 1. How much GCInspector warnings per hour are considered 'normal'?****
>
> 2. What should be the next thing to check?****
>
> 3. What are the possible failure reasons and how to prevent those?****
>
> ** **
>
> Thank you very much in advance,****
>
> Ivan****
>

RE: Cassandra nodes failing with OOM

Posted by Viktor Jevdokimov <Vi...@adform.com>.
We've seen OOM in a situation, when OS was not properly prepared in production.

http://www.datastax.com/docs/1.1/install/recommended_settings



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<ma...@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>


Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: some.unique.login@gmail.com [mailto:some.unique.login@gmail.com] On Behalf Of Ивaн Cобoлeв
Sent: Saturday, November 17, 2012 08:08
To: user@cassandra.apache.org
Subject: Cassandra nodes failing with OOM

Dear Community,

advice from you needed.

We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM message).
Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch.

Version:         1.1.6
Hardware:      12Gb RAM / 8 cores(virtual)
Data:              40Gb/node
Nodes:           36 nodes

Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)
CFs:                36, 2 indexes
Partitioner:      Random
Compaction:   Leveled(we don't want 2x space for housekeeping)
Caching:          Keys only

All is pretty much standard apart from the one CF receiving writes in 64K chunks and having sstable_size_in_mb=100.
No JNA installed - this is to be fixed soon.

Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the only change - network activity spiking.
All the nodes before dying had the following on logs:
> INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) MemtablePostFlusher               1         4         0
> INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) FlushWriter                       1         3         0
> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) HintedHandoff                     1         6         0
> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) CompactionManager                 5         9

GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 5-10mins.

So, could you please give me a hint on:
1. How much GCInspector warnings per hour are considered 'normal'?
2. What should be the next thing to check?
3. What are the possible failure reasons and how to prevent those?

Thank you very much in advance,
Ivan

Re: Cassandra nodes failing with OOM

Posted by Janne Jalkanen <ja...@ecyrd.com>.
Something that bit us recently was the size of bloom filters: we have a column family which is mostly written to, and only read sequentially, so we were able to free a lot of memory and decrease GC pressure by increasing bloom_filter_fp_chance for that particular CF.

This on 1.0.12.

/Janne

On 18 Nov 2012, at 21:38, aaron morton wrote:

>> 1. How much GCInspector warnings per hour are considered 'normal'?
> None. 
> A couple during compaction or repair is not the end of the world. But if you have enough to thinking about "per hour" it's too many. 
> 
>> 2. What should be the next thing to check?
> Try to determine if the GC activity correlates to application workload, compaction or repair. 
> 
> Try to determine what the working set of the server is. Watch the GC activity (via gc logs or JMX) and see what the size of the tenured heap is after a CMS. Or try to calculate it http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
> 
> Look at your data model and query patterns for places where very large queries are being made. Or rows that are very long lived with a lot of deletes (prob not as much as an issue with LDB). 
> 
> 
>> 3. What are the possible failure reasons and how to prevent those?
> 
> As above. 
> As a work around sometimes drastically slowing down compaction can help. For LDB try reducing in_memory_compaction_limit_in_mb and compaction_throughput_mb_per_sec
> 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 17/11/2012, at 7:07 PM, Ивaн Cобoлeв <so...@gmail.com> wrote:
> 
>> Dear Community, 
>> 
>> advice from you needed. 
>> 
>> We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM message). 
>> Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch.
>> 
>> Version:         1.1.6
>> Hardware:      12Gb RAM / 8 cores(virtual)
>> Data:              40Gb/node
>> Nodes:           36 nodes
>> 
>> Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)
>> CFs:                36, 2 indexes
>> Partitioner:      Random
>> Compaction:   Leveled(we don't want 2x space for housekeeping)
>> Caching:          Keys only
>> 
>> All is pretty much standard apart from the one CF receiving writes in 64K chunks and having sstable_size_in_mb=100.
>> No JNA installed - this is to be fixed soon.
>> 
>> Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the only change - network activity spiking. 
>> All the nodes before dying had the following on logs:
>>> INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) MemtablePostFlusher               1         4         0
>>> INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) FlushWriter                       1         3         0
>>> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) HintedHandoff                     1         6         0
>>> INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) CompactionManager                 5         9
>> 
>> GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 5-10mins.
>> 
>> So, could you please give me a hint on:
>> 1. How much GCInspector warnings per hour are considered 'normal'?
>> 2. What should be the next thing to check?
>> 3. What are the possible failure reasons and how to prevent those?
>> 
>> Thank you very much in advance,
>> Ivan


Re: Cassandra nodes failing with OOM

Posted by aaron morton <aa...@thelastpickle.com>.
> 1. How much GCInspector warnings per hour are considered 'normal'?
None. 
A couple during compaction or repair is not the end of the world. But if you have enough to thinking about "per hour" it's too many. 

> 2. What should be the next thing to check?
Try to determine if the GC activity correlates to application workload, compaction or repair. 

Try to determine what the working set of the server is. Watch the GC activity (via gc logs or JMX) and see what the size of the tenured heap is after a CMS. Or try to calculate it http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html

Look at your data model and query patterns for places where very large queries are being made. Or rows that are very long lived with a lot of deletes (prob not as much as an issue with LDB). 
 

> 3. What are the possible failure reasons and how to prevent those?

As above. 
As a work around sometimes drastically slowing down compaction can help. For LDB try reducing in_memory_compaction_limit_in_mb and compaction_throughput_mb_per_sec


Hope that helps. 

 
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/11/2012, at 7:07 PM, Ивaн Cобoлeв <so...@gmail.com> wrote:

> Dear Community, 
> 
> advice from you needed. 
> 
> We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM message). 
> Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch.
> 
> Version:         1.1.6
> Hardware:      12Gb RAM / 8 cores(virtual)
> Data:              40Gb/node
> Nodes:           36 nodes
> 
> Keyspaces:    2(RF=3, R=W=2) + 1(OpsCenter)
> CFs:                36, 2 indexes
> Partitioner:      Random
> Compaction:   Leveled(we don't want 2x space for housekeeping)
> Caching:          Keys only
> 
> All is pretty much standard apart from the one CF receiving writes in 64K chunks and having sstable_size_in_mb=100.
> No JNA installed - this is to be fixed soon.
> 
> Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the only change - network activity spiking. 
> All the nodes before dying had the following on logs:
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) MemtablePostFlusher               1         4         0
> > INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) FlushWriter                       1         3         0
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) HintedHandoff                     1         6         0
> > INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) CompactionManager                 5         9
> 
> GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 5-10mins.
> 
> So, could you please give me a hint on:
> 1. How much GCInspector warnings per hour are considered 'normal'?
> 2. What should be the next thing to check?
> 3. What are the possible failure reasons and how to prevent those?
> 
> Thank you very much in advance,
> Ivan