You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Michael Theroux <mt...@yahoo.com> on 2013/04/19 17:00:23 UTC

Advice on memory warning

Hello,

We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle additional load, but to also relieve memory pressure.  It appears to have accomplished both, however, we are still getting a warning, 0-3 times a day, on our database nodes:

WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

This is happening much less frequently than before the upgrade, but after essentially doubling the amount of available memory, I'm curious on what I can do to determine what is happening during this time.  

I am collecting all the JMX statistics.  Memtable space is elevated but not extraordinarily high.  No GC messages are being output to the log.   

These warnings do seem to be occurring doing compactions of column families using LCS with wide rows, but I'm not sure there is a direct correlation.    

We are running Cassandra 1.1.9, with a maximum heap of 8G.  

Any advice?
Thanks,
-Mike

Re: Advice on memory warning

Posted by aaron morton <aa...@thelastpickle.com>.

There have been a lot of discussions about GC tuning on the mail thread. Here's a really quick set of guidelines I use, please search the mail archive if it does not answer your question. 

If heavy GC activity correlates with cassandra compaction, do one or more of:
* reduce concurrent_compactions to 2 or 3
* reduce compaction_throughput
* reduce in_memory_compction_throughput

These are heavy handed changes designed to get things under control, you probably want to remove some of the changes later. 

Enable GC logging in cassandra-env.sh and look at how much memory is in use after a full/CMS compaction. If this is more than 50% of the heap you may end up doing a lot of GC. If you have hundreds of millions of rows per node, on pre 1.2, reduce the bloom_fp_chance on the CF's and index_sampling yaml config to reduce JVM memory use. 

If you have wide rows consider using (on 4 to 8 cores)
NEW_HEAP: 1000M
SurviviorRatio 4
MaxTenuringThreshold 4

Look at the tenuring distribution in the GC log to see how many ParNew passes objects make it through. If you often see more objects  with tenuring 1 or 2 consider running with MaxTenuringThreshold 2. This can help reduce the amount of premature tenuring. 

GC problems are a combination of workload and configuration, and sometimes take a while to sort out. 

Hope that helps 
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/04/2013, at 11:53 PM, Michael Theroux <mt...@yahoo.com> wrote:

> Hello,
> 
> Just to wrap up on my part of this thread, tuning CMS compaction threshold (-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with the memory warnings.  However, I don't believe this would be a solution to all the issues mentioned below.  Although, it does make sense to me tune this value below the "flush_largest_memtables_at" value in cassandra.yaml so CMS compaction will kick in before we start flushing memtables to free memory.
> 
> Thanks!
> -Mike
> 
> On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote:
> 
>> We are facing similar issue, and we are not able to have the ring stable.  We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
>> The total data ~ 84gb (which is relatively small for C* to handle, with a RF of 3).  Our application is heavy read, we see the GC complaints in all nodes, I copied and past the output below.
>> Also we usually see much larger values for the Pending - ReadStage, not sure what is the best advice for this.
>> 
>> Thanks,
>> 
>> Haithem
>>  
>> INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max is 8199471104
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) Pool Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) ReadStage                         4         4         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) RequestResponseStage              1         6         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) ReadRepairStage                   0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) MutationStage                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) ReplicateOnWriteStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) GossipStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) AntiEntropyStage                  0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) MigrationStage                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) MemtablePostFlusher               0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) FlushWriter                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) MiscStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) commitlog_archiver                0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) InternalResponseStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) AntiEntropySessions               0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) HintedHandoff                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line 73) CompactionManager                 0         0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 85) MessagingService                n/a      15,1
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 95) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 96) KeyCache                  251658064                251658081                      all     
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 102) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 109) ColumnFamily                Memtable ops,data
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.local                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.peers                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.batchlog                           0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.NodeIdInfo                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.LocationInfo                       0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.Schema                             0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.Migrations                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_keyspaces                   0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_columns                     0,0
>> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_columnfamilies                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.IndexInfo                          0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.range_xfers                        0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.peer_events                        0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.hints                              0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.HintsColumnFamily                  0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo2                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo3                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo4                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo5                      0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) x.foo6                 0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) x.foo7                     0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_auth.users                         0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_traces.sessions                    0,0
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_traces.events                      0,0
>>  WARN [ScheduledTasks:1] 2013-04-23 16:40:30,850 GCInspector.java (line 142) Heap is 0.824762725573964 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,850 StorageService.java (line 3537) Unable to reduce heap usage since there are no dirty column families
>> 
>> 
>> 
>> 
>> On 23 April 2013 16:52, Ralph Goers <ra...@dslextreme.com> wrote:
>> We are using DSE, which I believe is also 1.1.9.  We have basically had a non-usable cluster for months due to this error.  In our case, once it starts doing this it starts flushing sstables to disk and eventually fills up the disk to the point where it can't compact.  If we catch it soon enough and restart the node it usually can recover.
>> 
>> In our case, the heap size is 12 GB. As I understand it Cassandra will give 1/3 of that for sstables. I then noticed that we have one column family that is using nearly 4GB in bloom filters on each node.  Since the nodes will start doing this when the heap reaches 9GB we essentially only have 1GB of free memory so when compactions, cleanups, etc take place this situation starts happening.  We are working to change our data model to try to resolve this.
>> 
>> Ralph
>> 
>> On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:
>> 
>> > Hello,
>> >
>> > We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle additional load, but to also relieve memory pressure.  It appears to have accomplished both, however, we are still getting a warning, 0-3 times a day, on our database nodes:
>> >
>> > WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>> >
>> > This is happening much less frequently than before the upgrade, but after essentially doubling the amount of available memory, I'm curious on what I can do to determine what is happening during this time.
>> >
>> > I am collecting all the JMX statistics.  Memtable space is elevated but not extraordinarily high.  No GC messages are being output to the log.
>> >
>> > These warnings do seem to be occurring doing compactions of column families using LCS with wide rows, but I'm not sure there is a direct correlation.
>> >
>> > We are running Cassandra 1.1.9, with a maximum heap of 8G.
>> >
>> > Any advice?
>> > Thanks,
>> > -Mike
>> 
>> 
>

Re: Advice on memory warning

Posted by Michael Theroux <mt...@yahoo.com>.

Hello,

Just to wrap up on my part of this thread, tuning CMS compaction threshold (-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with the memory warnings.  However, I don't believe this would be a solution to all the issues mentioned below.  Although, it does make sense to me tune this value below the "flush_largest_memtables_at" value in cassandra.yaml so CMS compaction will kick in before we start flushing memtables to free memory.

Thanks!
-Mike

On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote:

> We are facing similar issue, and we are not able to have the ring stable.  We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
> The total data ~ 84gb (which is relatively small for C* to handle, with a RF of 3).  Our application is heavy read, we see the GC complaints in all nodes, I copied and past the output below.
> Also we usually see much larger values for the Pending - ReadStage, not sure what is the best advice for this.
> 
> Thanks,
> 
> Haithem
>  
> INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max is 8199471104
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max is 8199471104
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max is 8199471104
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) ReadStage                         4         4         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) RequestResponseStage              1         6         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) GossipStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) commitlog_archiver                0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) AntiEntropySessions               0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line 73) CompactionManager                 0         0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 85) MessagingService                n/a      15,1
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 95) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 96) KeyCache                  251658064                251658081                      all     
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 102) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 109) ColumnFamily                Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.local                              0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.peers                              0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.batchlog                           0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) system.NodeIdInfo                         0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.LocationInfo                       0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.Schema                             0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.Migrations                         0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_keyspaces                   0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_columns                     0,0
> INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line 112) system.schema_columnfamilies                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.IndexInfo                          0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.range_xfers                        0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.peer_events                        0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.hints                              0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line 112) system.HintsColumnFamily                  0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo2                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo3                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo4                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line 112) x.foo5                      0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) x.foo6                 0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) x.foo7                     0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_auth.users                         0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_traces.sessions                    0,0
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line 112) system_traces.events                      0,0
>  WARN [ScheduledTasks:1] 2013-04-23 16:40:30,850 GCInspector.java (line 142) Heap is 0.824762725573964 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,850 StorageService.java (line 3537) Unable to reduce heap usage since there are no dirty column families
> 
> 
> 
> 
> On 23 April 2013 16:52, Ralph Goers <ra...@dslextreme.com> wrote:
> We are using DSE, which I believe is also 1.1.9.  We have basically had a non-usable cluster for months due to this error.  In our case, once it starts doing this it starts flushing sstables to disk and eventually fills up the disk to the point where it can't compact.  If we catch it soon enough and restart the node it usually can recover.
> 
> In our case, the heap size is 12 GB. As I understand it Cassandra will give 1/3 of that for sstables. I then noticed that we have one column family that is using nearly 4GB in bloom filters on each node.  Since the nodes will start doing this when the heap reaches 9GB we essentially only have 1GB of free memory so when compactions, cleanups, etc take place this situation starts happening.  We are working to change our data model to try to resolve this.
> 
> Ralph
> 
> On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:
> 
> > Hello,
> >
> > We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle additional load, but to also relieve memory pressure.  It appears to have accomplished both, however, we are still getting a warning, 0-3 times a day, on our database nodes:
> >
> > WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> >
> > This is happening much less frequently than before the upgrade, but after essentially doubling the amount of available memory, I'm curious on what I can do to determine what is happening during this time.
> >
> > I am collecting all the JMX statistics.  Memtable space is elevated but not extraordinarily high.  No GC messages are being output to the log.
> >
> > These warnings do seem to be occurring doing compactions of column families using LCS with wide rows, but I'm not sure there is a direct correlation.
> >
> > We are running Cassandra 1.1.9, with a maximum heap of 8G.
> >
> > Any advice?
> > Thanks,
> > -Mike
> 
>

Re: Advice on memory warning

Posted by Haithem Jarraya <ha...@struq.com>.

We are facing similar issue, and we are not able to have the ring stable.
 We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
The total data ~ 84gb (which is relatively small for C* to handle, with a
RF of 3).  Our application is heavy read, we see the GC complaints in all
nodes, I copied and past the output below.
Also we usually see much larger values for the Pending - ReadStage, not
sure what is the best advice for this.

Thanks,

Haithem

INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119)
GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max
is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line
119) GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used;
max is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line
119) GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136
used; max is 8199471104
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line
53) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line
68) ReadStage                         4         4         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) RequestResponseStage              1         6         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line
68) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line
68) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line
68) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) AntiEntropySessions               0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line
68) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line
73) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
85) MessagingService                n/a      15,1
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
95) Cache Type                     Size                 Capacity
    KeysToSave
Provider
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
96) KeyCache                  251658064                251658081
           all
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
102) RowCache                          0                        0
           all
 org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line
109) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.local                              0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.peers                              0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.batchlog                           0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line
112) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.schema_columns                     0,0
INFO [ScheduledTasks:1] 2013-04-23 16:40:30,846 StatusLogger.java (line
112) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line
112) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line
112) system.range_xfers                        0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line
112) system.peer_events                        0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line
112) system.hints                              0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,847 StatusLogger.java (line
112) system.HintsColumnFamily                  0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line
112) x.foo                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line
112) x.foo2                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line
112) x.foo3                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line
112) x.foo4                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,848 StatusLogger.java (line
112) x.foo5                      0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line
112) x.foo6                 0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line
112) x.foo7                     0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line
112) system_auth.users                         0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line
112) system_traces.sessions                    0,0
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,849 StatusLogger.java (line
112) system_traces.events                      0,0
 WARN [ScheduledTasks:1] 2013-04-23 16:40:30,850 GCInspector.java (line
142) Heap is 0.824762725573964 full.  You may need to reduce memtable
and/or cache sizes.  Cassandra will now flush up to the two largest
memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2013-04-23 16:40:30,850 StorageService.java (line
3537) Unable to reduce heap usage since there are no dirty column families




On 23 April 2013 16:52, Ralph Goers <ra...@dslextreme.com> wrote:

> We are using DSE, which I believe is also 1.1.9.  We have basically had a
> non-usable cluster for months due to this error.  In our case, once it
> starts doing this it starts flushing sstables to disk and eventually fills
> up the disk to the point where it can't compact.  If we catch it soon
> enough and restart the node it usually can recover.
>
> In our case, the heap size is 12 GB. As I understand it Cassandra will
> give 1/3 of that for sstables. I then noticed that we have one column
> family that is using nearly 4GB in bloom filters on each node.  Since the
> nodes will start doing this when the heap reaches 9GB we essentially only
> have 1GB of free memory so when compactions, cleanups, etc take place this
> situation starts happening.  We are working to change our data model to try
> to resolve this.
>
> Ralph
>
> On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:
>
> > Hello,
> >
> > We've recently upgraded from m1.large to m1.xlarge instances on AWS to
> handle additional load, but to also relieve memory pressure.  It appears to
> have accomplished both, however, we are still getting a warning, 0-3 times
> a day, on our database nodes:
> >
> > WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line
> 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
> >
> > This is happening much less frequently than before the upgrade, but
> after essentially doubling the amount of available memory, I'm curious on
> what I can do to determine what is happening during this time.
> >
> > I am collecting all the JMX statistics.  Memtable space is elevated but
> not extraordinarily high.  No GC messages are being output to the log.
> >
> > These warnings do seem to be occurring doing compactions of column
> families using LCS with wide rows, but I'm not sure there is a direct
> correlation.
> >
> > We are running Cassandra 1.1.9, with a maximum heap of 8G.
> >
> > Any advice?
> > Thanks,
> > -Mike
>
>

Re: Advice on memory warning

Posted by Ralph Goers <ra...@dslextreme.com>.

We are using DSE, which I believe is also 1.1.9.  We have basically had a non-usable cluster for months due to this error.  In our case, once it starts doing this it starts flushing sstables to disk and eventually fills up the disk to the point where it can't compact.  If we catch it soon enough and restart the node it usually can recover.

In our case, the heap size is 12 GB. As I understand it Cassandra will give 1/3 of that for sstables. I then noticed that we have one column family that is using nearly 4GB in bloom filters on each node.  Since the nodes will start doing this when the heap reaches 9GB we essentially only have 1GB of free memory so when compactions, cleanups, etc take place this situation starts happening.  We are working to change our data model to try to resolve this.

Ralph 

On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:

> Hello,
> 
> We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle additional load, but to also relieve memory pressure.  It appears to have accomplished both, however, we are still getting a warning, 0-3 times a day, on our database nodes:
> 
> WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> 
> This is happening much less frequently than before the upgrade, but after essentially doubling the amount of available memory, I'm curious on what I can do to determine what is happening during this time.  
> 
> I am collecting all the JMX statistics.  Memtable space is elevated but not extraordinarily high.  No GC messages are being output to the log.   
> 
> These warnings do seem to be occurring doing compactions of column families using LCS with wide rows, but I'm not sure there is a direct correlation.    
> 
> We are running Cassandra 1.1.9, with a maximum heap of 8G.  
> 
> Any advice?
> Thanks,
> -Mike

RE: Advice on memory warning

Posted by mo...@barclays.com.

My experience (running C* 1.2.2): 

1. I also observe that this occurs during compaction. 
2. I have never yet seen a node recover from this state. Once it starts complaining about heap, it starts a death spiral, i.e., futile attempts to fix the situation. Eventually the node starts running GC for so long that it looks down to the other nodes. 
3. Do you observe a lot of pending MemtablePostFlusher tasks in the log? I believe this is another symptom.

Like you, I would love to get advice on what causes this and how to avoid it.

-----Original Message-----
From: Michael Theroux [mailto:mtheroux2@yahoo.com] 
Sent: Friday, April 19, 2013 6:00 PM
To: user@cassandra.apache.org
Subject: Advice on memory warning

Hello,

We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle additional load, but to also relieve memory pressure.  It appears to have accomplished both, however, we are still getting a warning, 0-3 times a day, on our database nodes:

WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

This is happening much less frequently than before the upgrade, but after essentially doubling the amount of available memory, I'm curious on what I can do to determine what is happening during this time.  

I am collecting all the JMX statistics.  Memtable space is elevated but not extraordinarily high.  No GC messages are being output to the log.   

These warnings do seem to be occurring doing compactions of column families using LCS with wide rows, but I'm not sure there is a direct correlation.    

We are running Cassandra 1.1.9, with a maximum heap of 8G.  

Any advice?
Thanks,
-Mike
_______________________________________________

This message may contain information that is confidential or privileged. If you are not an intended recipient of this message, please delete it and any attachments, and notify the sender that you have received it in error. Unless specifically stated in the message or otherwise indicated, you may not duplicate, redistribute or forward this message or any portion thereof, including any attachments, by any means to any other person, including any retail investor or customer. This message is not a recommendation, advice, offer or solicitation, to buy/sell any product or service, and is not an official confirmation of any transaction. Any opinions presented are solely those of the author and do not necessarily represent those of Barclays.

This message is subject to terms available at: www.barclays.com/emaildisclaimer and, if received from Barclays' Sales or Trading desk, the terms available at: www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you consent to the foregoing. Barclays Bank PLC is a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays group.

_______________________________________________

Re: Advice on memory warning

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I would have said the exact opposite, but I am not really sure.

I have configured the threshold to 80% of the heap since I have 8GB heap. I
think the purpose of this threshold is to keep a security margin to avoid
OOMing. C* can be configured with 1GB heap so the margin is about 250MB. On
a 8GB heap you have 2GB unused heap (since memtables will be forced to
flush when you reach 6GB). My guess is that the more memory you have in the
heap, the more you can tune this threshold high.

Yet, this isn't a solution to anything since you shouldn't reach such a
high heap used, but more a workaround.

Once again this is my interpretation, I can be wrong.

Alain


2013/4/21 Derek Williams <de...@fyrie.net>

> The CMS compaction threshold is usually set to 75% as well, it might help
> to set it lower to 70% to see if that resolves these warnings as Cassandra
> will start CMS GC before it hits the 75% warning.
>
> There is also a setting to lower the max amount of memory used for
> compacting each row. This may cause compactions to take longer though as
> each row that surpasses this limit will need to be compacted on disk
> instead of in memory.
>
>
> On Fri, Apr 19, 2013 at 9:00 AM, Michael Theroux <mt...@yahoo.com>wrote:
>
>> Hello,
>>
>> We've recently upgraded from m1.large to m1.xlarge instances on AWS to
>> handle additional load, but to also relieve memory pressure.  It appears to
>> have accomplished both, however, we are still getting a warning, 0-3 times
>> a day, on our database nodes:
>>
>> WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line
>> 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable
>> and/or cache sizes.  Cassandra will now flush up to the two largest
>> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
>> in cassandra.yaml if you don't want Cassandra to do this automatically
>>
>> This is happening much less frequently than before the upgrade, but after
>> essentially doubling the amount of available memory, I'm curious on what I
>> can do to determine what is happening during this time.
>>
>> I am collecting all the JMX statistics.  Memtable space is elevated but
>> not extraordinarily high.  No GC messages are being output to the log.
>>
>> These warnings do seem to be occurring doing compactions of column
>> families using LCS with wide rows, but I'm not sure there is a direct
>> correlation.
>>
>> We are running Cassandra 1.1.9, with a maximum heap of 8G.
>>
>> Any advice?
>> Thanks,
>> -Mike
>
>
>
>
> --
> Derek Williams
>

Re: Advice on memory warning

Posted by Derek Williams <de...@fyrie.net>.

The CMS compaction threshold is usually set to 75% as well, it might help
to set it lower to 70% to see if that resolves these warnings as Cassandra
will start CMS GC before it hits the 75% warning.

There is also a setting to lower the max amount of memory used for
compacting each row. This may cause compactions to take longer though as
each row that surpasses this limit will need to be compacted on disk
instead of in memory.


On Fri, Apr 19, 2013 at 9:00 AM, Michael Theroux <mt...@yahoo.com>wrote:

> Hello,
>
> We've recently upgraded from m1.large to m1.xlarge instances on AWS to
> handle additional load, but to also relieve memory pressure.  It appears to
> have accomplished both, however, we are still getting a warning, 0-3 times
> a day, on our database nodes:
>
> WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line
> 145) Heap is 0.7529240824406468 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
>
> This is happening much less frequently than before the upgrade, but after
> essentially doubling the amount of available memory, I'm curious on what I
> can do to determine what is happening during this time.
>
> I am collecting all the JMX statistics.  Memtable space is elevated but
> not extraordinarily high.  No GC messages are being output to the log.
>
> These warnings do seem to be occurring doing compactions of column
> families using LCS with wide rows, but I'm not sure there is a direct
> correlation.
>
> We are running Cassandra 1.1.9, with a maximum heap of 8G.
>
> Any advice?
> Thanks,
> -Mike




-- 
Derek Williams