You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Guillermo Barbero <gu...@spotbros.com> on 2013/01/30 12:47:57 UTC
too many warnings of Heap is full
Hi,
I'm viewing a weird behaviour in my cassandra cluster. Most of the
warning messages are due to Heap is % full. According to this link
(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
there are two ways to "reduce pressure":
1. Decrease the cache sizes
2. Increase the index interval size
Most of the flushes are in two column families (users and messages), I
guess that's because the most mutations are there.
I still have not applied those changes to the production environment.
Do you recommend any other meassure? Should I set specific tunning for
these two CFs? Should I check another metric?
Additionally, the distribution of warning messages is not uniform
along the cluster. Why could cassandra be doing this? What should I do
to find out how to fix this?
cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
the java version is the following:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
The cassandra system.log is resumed here (numer of messages, cassandra
node, class that reports the message, first word of the message)
2013-01-26
5 cassNode0: GCInspector.java Heap
5 cassNode0: StorageService.java Flushing
232 cassNode2: GCInspector.java Heap
232 cassNode2: StorageService.java Flushing
104 cassNode3: GCInspector.java Heap
104 cassNode3: StorageService.java Flushing
3 cassNode4: GCInspector.java Heap
3 cassNode4: StorageService.java Flushing
3 cassNode5: GCInspector.java Heap
3 cassNode5: StorageService.java Flushing
2013-01-27
2 cassNode0: GCInspector.java Heap
2 cassNode0: StorageService.java Flushing
3 cassNode1: GCInspector.java Heap
3 cassNode1: StorageService.java Flushing
189 cassNode2: GCInspector.java Heap
189 cassNode2: StorageService.java Flushing
104 cassNode3: GCInspector.java Heap
104 cassNode3: StorageService.java Flushing
1 cassNode4: GCInspector.java Heap
1 cassNode4: StorageService.java Flushing
1 cassNode5: GCInspector.java Heap
1 cassNode5: StorageService.java Flushing
2013-01-28
2 cassNode0: GCInspector.java Heap
2 cassNode0: StorageService.java Flushing
1 cassNode1: GCInspector.java Heap
1 cassNode1: StorageService.java Flushing
1 cassNode2: AutoSavingCache.java Reducing
343 cassNode2: GCInspector.java Heap
342 cassNode2: StorageService.java Flushing
181 cassNode3: GCInspector.java Heap
181 cassNode3: StorageService.java Flushing
4 cassNode4: GCInspector.java Heap
4 cassNode4: StorageService.java Flushing
3 cassNode5: GCInspector.java Heap
3 cassNode5: StorageService.java Flushing
2013-01-29
2 cassNode0: GCInspector.java Heap
2 cassNode0: StorageService.java Flushing
3 cassNode1: GCInspector.java Heap
3 cassNode1: StorageService.java Flushing
156 cassNode2: GCInspector.java Heap
156 cassNode2: StorageService.java Flushing
71 cassNode3: GCInspector.java Heap
71 cassNode3: StorageService.java Flushing
2 cassNode4: GCInspector.java Heap
2 cassNode4: StorageService.java Flushing
2 cassNode5: GCInspector.java Heap
1 cassNode5: Memtable.java setting
2 cassNode5: StorageService.java Flushing
--
Guillermo Barbero - Backend Team
Spotbros Technologies
Re: too many warnings of Heap is full
Posted by Nate McCall <zz...@gmail.com>.
Your latencies and distribution look fine.
How big/what types of queries are you issuing? Are you issuing a lot
of large multigets?
Also, do either of these column families have secondary indexes?
On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero
<gu...@spotbros.com> wrote:
> Iep,
>
> I missed the attachment...
>
> Also, these are the cfstats of 152:
>
> Column Family: CF_SBMessages
> SSTable count: 3
> Space used (live): 967238560
> Space used (total): 967238560
> Number of Keys (estimate): 3263232
> Memtable Columns Count: 2112
> Memtable Data Size: 577135
> Memtable Switch Count: 5
> Read Count: 20702
> Read Latency: 0.181 ms.
> Write Count: 19888
> Write Latency: 0.059 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 14
> Bloom Filter False Ratio: 0.09375
> Bloom Filter Space Used: 6171168
> Compacted row minimum size: 536
> Compacted row maximum size: 24601
> Compacted row mean size: 850
>
> Column Family: CF_users
> SSTable count: 3
> Space used (live): 152205376
> Space used (total): 152205376
> Number of Keys (estimate): 343040
> Memtable Columns Count: 154398
> Memtable Data Size: 21159410
> Memtable Switch Count: 9
> Read Count: 11816
> Read Latency: 1.348 ms.
> Write Count: 128751
> Write Latency: 0.090 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 688464
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1235
>
> And the cfstats of 153:
>
> Column Family: CF_SBMessages
> SSTable count: 5
> Space used (live): 965495648
> Space used (total): 965495648
> Number of Keys (estimate): 3257216
> Memtable Columns Count: 56541
> Memtable Data Size: 15699960
> Memtable Switch Count: 2
> Read Count: 22475
> Read Latency: 0.142 ms.
> Write Count: 21719
> Write Latency: 0.073 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 43
> Bloom Filter False Ratio: 0.19545
> Bloom Filter Space Used: 6161032
> Compacted row minimum size: 536
> Compacted row maximum size: 11864
> Compacted row mean size: 872
>
> Column Family: CF_users
> SSTable count: 2
> Space used (live): 148762893
> Space used (total): 148762893
> Number of Keys (estimate): 333312
> Memtable Columns Count: 129725
> Memtable Data Size: 17144125
> Memtable Switch Count: 4
> Read Count: 7440
> Read Latency: 1.329 ms.
> Write Count: 127465
> Write Latency: 0.093 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 694112
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1298
>
> The messages are in a keyspace with a replication factor of 2 and the
> users are in a keyspace with a replication factor of 3.
>
> Ah, and we use the RandomPartitioner.
>
> Thanks again
>
>
> 2013/1/30 Guillermo Barbero <gu...@spotbros.com>
>>
>> Guys,
>>
>> First thanks a lot for your answers.. now I have a little more info, lets see if this helps:
>>
>> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture).
>>
>> Today we have found this:
>>
>> Prior to a fall in performance on all Cassandra nodes log I can see this:
>>
>> Node 150:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>> INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>> INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>> INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>> INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>> INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
>> INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>>
>> Node 151:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>> INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>> INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>> INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>> INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>> INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>>
>> Node 153:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
>> INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013). calculation took 109ms for 64 columns
>> INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334). calculation took 11ms for 129 columns
>> INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>> INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>> INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826). calculation took 234ms for 144 columns
>> INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)
>>
>> Node 154:
>>
>> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>> INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>> INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>> INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>> INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>>
>>
>> Node 155:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>> INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>> INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)
>>
>> Meanwhile on node 152:
>>
>> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name Active Pending Blocked
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage 32 343 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff 0 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager 0 0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService n/a 0,24
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type Size Capacity KeysToSave Provider
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache 133569 2184533 all
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily Memtable ops,data
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions 3,103
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily 1,2369
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes 9241,30380203
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics 427,1173642
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters 5,232
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda 7919,2332898
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages 54390,15054201
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline 1866,7681861
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots 1,616
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines 423,533156
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes 16,565
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes 154,93735
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser 80,107551
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments 58,77581
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments 641,247796
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda 9047,7132129
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers 823,1206779
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS 193,577229
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps 8,170
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData 126,3503
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications 1906,58481
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes 953,438671
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications 1354,2070513
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes 463,230386
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts 2968,4849519
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended 0,0
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList 5,207
>> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users 65386,10852776
>> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
>> INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>> INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>> INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
>> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio to maximum of 64.0 instead of 751.6512549537648
>> INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0). calculation took 61ms for 8 columns
>> INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864). calculation took 11ms for 170 columns
>> INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296) Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
>> INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>> INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>> INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
>> INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>>
>> The cluster is perfectly balanced:
>>
>> Note: Ownership information does not include topology, please specify a keyspace.
>> Address DC Rack Status State Load Owns Token
>> 141784319550391032739561396922763706368
>> 10.0.0.150 datacenter1 rack1 Up Normal 13.25 GB 16.67% 0
>> 10.0.0.151 datacenter1 rack1 Up Normal 10.31 GB 16.67% 28356863910078203714492389662765613056
>> 10.0.0.152 datacenter1 rack1 Up Normal 8.58 GB 16.67% 56713727820156407428984779325531226112
>> 10.0.0.153 datacenter1 rack1 Up Normal 8.1 GB 16.67% 85070591730234615865843651857942052864
>> 10.0.0.154 datacenter1 rack1 Up Normal 8.28 GB 16.67% 113427455640312814857969558651062452224
>> 10.0.0.155 datacenter1 rack1 Up Normal 9.38 GB 16.67% 141784319550391032739561396922763706368
>>
>>
>> I guess something happens on 152 that makes other nodes believe it is dead, then does something to fix itself and comes back after (messing up performance in the meanwhile) .Does this makes any sense? Any clue of what might be going on? I can post more logs if you guys need
>>
>> Thanks again!!
>>
>> Guillermo
>>
>>
>>
>> 2013/1/30 Nate McCall <zz...@gmail.com>
>>>
>>> What's the output of nodetool cfstats for those 2 column families on
>>> cassNode2 and cassNode3? And what is the replication factor for this
>>> cluster?
>>>
>>> Per the previous reply, nodetool ring should show each of your nodes
>>> with ~16.7% of the data if well balanced.
>>>
>>> Also, the auto-detection for memory sizes in the startup script is a
>>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>>> 8g/800m will give you some more breathing room.
>>>
>>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>> > My guess is that those one or two nodes with the gc pressure also have more
>>> > rows in your big CF. More rows could be due to imbalanced distribution if
>>> > your'e not using a random partitioner or from those nodes not yet removing
>>> > deleted rows which other nodes may have done.
>>> >
>>> > JVM heap space is used for a few things which scale with key count
>>> > including:
>>> > - bloom filter (for C* < 1.2)
>>> > - index samples
>>> >
>>> > Other space is used but can be more easily controlled by tuning for
>>> > - memtable
>>> > - compaction
>>> > - key cache
>>> > - row cache
>>> >
>>> >
>>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>>> > cfstats") than the others you can try to:
>>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>>> > to remove rows with expired tombstones, etc.
>>> > - increase bloom filter fp chance
>>> > - increase jvm heap size (don't go too big)
>>> > - disable key or row cache
>>> > - increase index sample interval
>>> >
>>> > Not all of those things are generally good especially to the extreme so
>>> > don't go setting a 20 GB jvm heap without understanding the consequences for
>>> > example.
>>> >
>>> > -Bryan
>>> >
>>> >
>>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>>> > <gu...@spotbros.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I'm viewing a weird behaviour in my cassandra cluster. Most of the
>>> >> warning messages are due to Heap is % full. According to this link
>>> >>
>>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>>> >> there are two ways to "reduce pressure":
>>> >> 1. Decrease the cache sizes
>>> >> 2. Increase the index interval size
>>> >>
>>> >> Most of the flushes are in two column families (users and messages), I
>>> >> guess that's because the most mutations are there.
>>> >>
>>> >> I still have not applied those changes to the production environment.
>>> >> Do you recommend any other meassure? Should I set specific tunning for
>>> >> these two CFs? Should I check another metric?
>>> >>
>>> >> Additionally, the distribution of warning messages is not uniform
>>> >> along the cluster. Why could cassandra be doing this? What should I do
>>> >> to find out how to fix this?
>>> >>
>>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>>> >> the java version is the following:
>>> >> java version "1.6.0_37"
>>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>> >>
>>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>>> >> node, class that reports the message, first word of the message)
>>> >> 2013-01-26
>>> >> 5 cassNode0: GCInspector.java Heap
>>> >> 5 cassNode0: StorageService.java Flushing
>>> >> 232 cassNode2: GCInspector.java Heap
>>> >> 232 cassNode2: StorageService.java Flushing
>>> >> 104 cassNode3: GCInspector.java Heap
>>> >> 104 cassNode3: StorageService.java Flushing
>>> >> 3 cassNode4: GCInspector.java Heap
>>> >> 3 cassNode4: StorageService.java Flushing
>>> >> 3 cassNode5: GCInspector.java Heap
>>> >> 3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-27
>>> >> 2 cassNode0: GCInspector.java Heap
>>> >> 2 cassNode0: StorageService.java Flushing
>>> >> 3 cassNode1: GCInspector.java Heap
>>> >> 3 cassNode1: StorageService.java Flushing
>>> >> 189 cassNode2: GCInspector.java Heap
>>> >> 189 cassNode2: StorageService.java Flushing
>>> >> 104 cassNode3: GCInspector.java Heap
>>> >> 104 cassNode3: StorageService.java Flushing
>>> >> 1 cassNode4: GCInspector.java Heap
>>> >> 1 cassNode4: StorageService.java Flushing
>>> >> 1 cassNode5: GCInspector.java Heap
>>> >> 1 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-28
>>> >> 2 cassNode0: GCInspector.java Heap
>>> >> 2 cassNode0: StorageService.java Flushing
>>> >> 1 cassNode1: GCInspector.java Heap
>>> >> 1 cassNode1: StorageService.java Flushing
>>> >> 1 cassNode2: AutoSavingCache.java Reducing
>>> >> 343 cassNode2: GCInspector.java Heap
>>> >> 342 cassNode2: StorageService.java Flushing
>>> >> 181 cassNode3: GCInspector.java Heap
>>> >> 181 cassNode3: StorageService.java Flushing
>>> >> 4 cassNode4: GCInspector.java Heap
>>> >> 4 cassNode4: StorageService.java Flushing
>>> >> 3 cassNode5: GCInspector.java Heap
>>> >> 3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-29
>>> >> 2 cassNode0: GCInspector.java Heap
>>> >> 2 cassNode0: StorageService.java Flushing
>>> >> 3 cassNode1: GCInspector.java Heap
>>> >> 3 cassNode1: StorageService.java Flushing
>>> >> 156 cassNode2: GCInspector.java Heap
>>> >> 156 cassNode2: StorageService.java Flushing
>>> >> 71 cassNode3: GCInspector.java Heap
>>> >> 71 cassNode3: StorageService.java Flushing
>>> >> 2 cassNode4: GCInspector.java Heap
>>> >> 2 cassNode4: StorageService.java Flushing
>>> >> 2 cassNode5: GCInspector.java Heap
>>> >> 1 cassNode5: Memtable.java setting
>>> >> 2 cassNode5: StorageService.java Flushing
>>> >>
>>> >> --
>>> >>
>>> >> Guillermo Barbero - Backend Team
>>> >>
>>> >> Spotbros Technologies
>>> >
>>> >
>>> >
>>
>>
>>
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>>
>>
>>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
Re: too many warnings of Heap is full
Posted by Guillermo Barbero <gu...@spotbros.com>.
Iep,
I missed the attachment...
Also, these are the cfstats of 152:
Column Family: CF_SBMessages
SSTable count: 3
Space used (live): 967238560
Space used (total): 967238560
Number of Keys (estimate): 3263232
Memtable Columns Count: 2112
Memtable Data Size: 577135
Memtable Switch Count: 5
Read Count: 20702
Read Latency: 0.181 ms.
Write Count: 19888
Write Latency: 0.059 ms.
Pending Tasks: 0
Bloom Filter False Postives: 14
Bloom Filter False Ratio: 0.09375
Bloom Filter Space Used: 6171168
Compacted row minimum size: 536
Compacted row maximum size: 24601
Compacted row mean size: 850
Column Family: CF_users
SSTable count: 3
Space used (live): 152205376
Space used (total): 152205376
Number of Keys (estimate): 343040
Memtable Columns Count: 154398
Memtable Data Size: 21159410
Memtable Switch Count: 9
Read Count: 11816
Read Latency: 1.348 ms.
Write Count: 128751
Write Latency: 0.090 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 688464
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1235
And the cfstats of 153:
Column Family: CF_SBMessages
SSTable count: 5
Space used (live): 965495648
Space used (total): 965495648
Number of Keys (estimate): 3257216
Memtable Columns Count: 56541
Memtable Data Size: 15699960
Memtable Switch Count: 2
Read Count: 22475
Read Latency: 0.142 ms.
Write Count: 21719
Write Latency: 0.073 ms.
Pending Tasks: 0
Bloom Filter False Postives: 43
Bloom Filter False Ratio: 0.19545
Bloom Filter Space Used: 6161032
Compacted row minimum size: 536
Compacted row maximum size: 11864
Compacted row mean size: 872
Column Family: CF_users
SSTable count: 2
Space used (live): 148762893
Space used (total): 148762893
Number of Keys (estimate): 333312
Memtable Columns Count: 129725
Memtable Data Size: 17144125
Memtable Switch Count: 4
Read Count: 7440
Read Latency: 1.329 ms.
Write Count: 127465
Write Latency: 0.093 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 694112
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1298
The messages are in a keyspace with a replication factor of 2 and the
users are in a keyspace with a replication factor of 3.
Ah, and we use the RandomPartitioner.
Thanks again
2013/1/30 Guillermo Barbero <gu...@spotbros.com>
>
> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets see if this helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> Node 150:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
> INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
> INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
> INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> Node 151:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
> INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
> INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>
> Node 153:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
> INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
> INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013). calculation took 109ms for 64 columns
> INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334). calculation took 11ms for 129 columns
> INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826). calculation took 234ms for 144 columns
> INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)
>
> Node 154:
>
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
> INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
> INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)
>
> Meanwhile on node 152:
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name Active Pending Blocked
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage 32 343 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService n/a 0,24
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type Size Capacity KeysToSave Provider
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache 133569 2184533 all
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily Memtable ops,data
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions 3,103
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily 1,2369
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes 9241,30380203
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics 427,1173642
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters 5,232
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda 7919,2332898
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages 54390,15054201
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline 1866,7681861
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots 1,616
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines 423,533156
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes 16,565
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes 154,93735
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser 80,107551
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments 58,77581
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments 641,247796
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda 9047,7132129
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers 823,1206779
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS 193,577229
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps 8,170
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData 126,3503
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications 1906,58481
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes 953,438671
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications 1354,2070513
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes 463,230386
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts 2968,4849519
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList 5,207
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users 65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
> INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
> INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
> INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio to maximum of 64.0 instead of 751.6512549537648
> INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0). calculation took 61ms for 8 columns
> INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864). calculation took 11ms for 170 columns
> INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296) Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
> INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
> INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
> INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
> INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a keyspace.
> Address DC Rack Status State Load Owns Token
> 141784319550391032739561396922763706368
> 10.0.0.150 datacenter1 rack1 Up Normal 13.25 GB 16.67% 0
> 10.0.0.151 datacenter1 rack1 Up Normal 10.31 GB 16.67% 28356863910078203714492389662765613056
> 10.0.0.152 datacenter1 rack1 Up Normal 8.58 GB 16.67% 56713727820156407428984779325531226112
> 10.0.0.153 datacenter1 rack1 Up Normal 8.1 GB 16.67% 85070591730234615865843651857942052864
> 10.0.0.154 datacenter1 rack1 Up Normal 8.28 GB 16.67% 113427455640312814857969558651062452224
> 10.0.0.155 datacenter1 rack1 Up Normal 9.38 GB 16.67% 141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is dead, then does something to fix itself and comes back after (messing up performance in the meanwhile) .Does this makes any sense? Any clue of what might be going on? I can post more logs if you guys need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zz...@gmail.com>
>>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have more
>> > rows in your big CF. More rows could be due to imbalanced distribution if
>> > your'e not using a random partitioner or from those nodes not yet removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <gu...@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >> 5 cassNode0: GCInspector.java Heap
>> >> 5 cassNode0: StorageService.java Flushing
>> >> 232 cassNode2: GCInspector.java Heap
>> >> 232 cassNode2: StorageService.java Flushing
>> >> 104 cassNode3: GCInspector.java Heap
>> >> 104 cassNode3: StorageService.java Flushing
>> >> 3 cassNode4: GCInspector.java Heap
>> >> 3 cassNode4: StorageService.java Flushing
>> >> 3 cassNode5: GCInspector.java Heap
>> >> 3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 3 cassNode1: GCInspector.java Heap
>> >> 3 cassNode1: StorageService.java Flushing
>> >> 189 cassNode2: GCInspector.java Heap
>> >> 189 cassNode2: StorageService.java Flushing
>> >> 104 cassNode3: GCInspector.java Heap
>> >> 104 cassNode3: StorageService.java Flushing
>> >> 1 cassNode4: GCInspector.java Heap
>> >> 1 cassNode4: StorageService.java Flushing
>> >> 1 cassNode5: GCInspector.java Heap
>> >> 1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 1 cassNode1: GCInspector.java Heap
>> >> 1 cassNode1: StorageService.java Flushing
>> >> 1 cassNode2: AutoSavingCache.java Reducing
>> >> 343 cassNode2: GCInspector.java Heap
>> >> 342 cassNode2: StorageService.java Flushing
>> >> 181 cassNode3: GCInspector.java Heap
>> >> 181 cassNode3: StorageService.java Flushing
>> >> 4 cassNode4: GCInspector.java Heap
>> >> 4 cassNode4: StorageService.java Flushing
>> >> 3 cassNode5: GCInspector.java Heap
>> >> 3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 3 cassNode1: GCInspector.java Heap
>> >> 3 cassNode1: StorageService.java Flushing
>> >> 156 cassNode2: GCInspector.java Heap
>> >> 156 cassNode2: StorageService.java Flushing
>> >> 71 cassNode3: GCInspector.java Heap
>> >> 71 cassNode3: StorageService.java Flushing
>> >> 2 cassNode4: GCInspector.java Heap
>> >> 2 cassNode4: StorageService.java Flushing
>> >> 2 cassNode5: GCInspector.java Heap
>> >> 1 cassNode5: Memtable.java setting
>> >> 2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
>
>
>
--
Guillermo Barbero - Backend Team
Spotbros Technologies
Re: too many warnings of Heap is full
Posted by Derek Williams <de...@fyrie.net>.
My first guess is the 3 ParNew collections taking 35 seconds is why your
clients had long response times. That is a very abnormal stat. Combined
with the low memory usage at that time, it seems highly unlikely there was
actually 35 seconds worth of GCing going on there. Unless there is anything
interesting in the lines above that snippet.
I'd first double check that your new generation hasn't
been accidentally set to something insanely large or small, just to rule
that out (but judging from the 4G of heap that is set you are most likely
using defaults, which is good). After that I'd try replacing that node as
it could be a hardware problem.
Another reason I have seen large pauses on a linux VPS that might be worth
looking into is low entropy (possibly due to someone else abusing your
shared hardware).
On Wed, Jan 30, 2013 at 3:44 PM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:
> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets
> see if this helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
> (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
> weeks we are seing that the performace of the cluster falls.. PHP commands
> that usually lasted a few milliseconds last up to 15 seconds for just 2/3
> minutes every now and then (no specific pattern) (see commands timing in
> attached picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> *Node 150:*
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
> INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
> INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
> (84559 bytes) for commitlog position
> ReplayPosition(segmentId=1355151995930, position=32659991)
> INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283
> CompactionTask.java (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> *Node 151:*
> *
> *
> INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
> INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
> INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
> (93711 bytes) for commitlog position
> ReplayPosition(segmentId=1354832000013, position=20033362)
>
> *Node 153:*
>
> * *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
> INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
> 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
> 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
> 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
> 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
> 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
> 3886022656
> INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.4195889775077013 (just-counted was 1.4195889775077013). calculation took
> 109ms for 64 columns
> INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.5276925191259334 (just-counted was 1.5276925191259334). calculation took
> 11ms for 129 columns
> INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.561312033901826 (just-counted was 1.561312033901826). calculation took
> 234ms for 144 columns
> INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900
> serialized/live bytes, 288 ops)
>
> *Node 154:*
> *
> *
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
> INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
> serialized/live bytes, 286 ops)
> INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
> bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
> INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
> INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
> INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654
> serialized/live bytes, 316 ops)
>
> *Meanwhile on node 152:*
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
> (639626 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=11032499)
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
> 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
> 3886022656
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
> 57) Pool Name Active Pending Blocked
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
> 72) ReadStage 32 343 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) RequestResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) ReadRepairStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) MutationStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) GossipStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) AntiEntropyStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) MigrationStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) StreamStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MemtablePostFlusher 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) FlushWriter 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) commitlog_archiver 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 77) CompactionManager 0 0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 89) MessagingService n/a 0,24
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 99) Cache Type Size Capacity
> KeysToSave
> Provider
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 100) KeyCache 133569 2184533
> all
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 106) RowCache 0 0
> all
> org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 113) ColumnFamily Memtable ops,data
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 116) system.NodeIdInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.IndexInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.LocationInfo 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.Versions 3,103
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.schema_keyspaces 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
> 116) system.Migrations 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
> 116) system.schema_columnfamilies 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.schema_columns 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.HintsColumnFamily 1,2369
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.Schema 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_mailPipes 9241,30380203
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_profilePics 427,1173642
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBVars 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCounters 5,232
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
> 116) SB_NNCD.CF_userPhoneAgenda 7919,2332898
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLUsers 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_mailTimeline 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_SBMessages 54390,15054201
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_NewsFeedTimeline 1866,7681861
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLSpots 1,616
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBUniqueIds 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBChecks 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_NFTimelines 423,533156
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBAppUsersProfilePic 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBLikes 16,565
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCodes 154,93735
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_attachmentsPerUser 80,107551
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_comments 58,77581
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_attachments 641,247796
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_SBStats 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_userReversedPhoneAgenda 9047,7132129
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_ACD.CF_GSMembers 823,1206779
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
> 116) SB_ACD.CF_NSLoad 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
> 116) SB_ACD.CF_sessions 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
> 116) SB_ACD.CF_userGS 193,577229
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_userApps 8,170
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_GSData 126,3503
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
> 116) SB_ACD.CF_R2Notifications 1906,58481
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
> 116) SB_ACD.CF_uniqueKeys 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
> 116) SB_ACD.CF_userContactWannabes 953,438671
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
> 116) SB_ACD.CF_SBApps 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
> 116) SB_ACD.CF_SBAppUsers 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_R1Notifications 1354,2070513
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_SBVotes 463,230386
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
> 116) SB_ACD.CF_userContacts 2968,4849519
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_sessionsSuspended 0,0
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_userBlockedList 5,207
> INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
> 116) SB_ACD.CF_users 65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line
> 145) Heap is 0.7514124302624755 full. You may need to reduce memtable
> and/or cache sizes. Cassandra will now flush up to the two largest
> memtables to free up memory. Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
> 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
> relieve memory pressure
> INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
> serialized/live bytes, 9380 ops)
> INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
> Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
> bytes, 9380 ops)
> INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
> (2049401 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=13578602)
> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
> setting live ratio to maximum of 64.0 instead of 751.6512549537648
> INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
> (just-counted was 64.0). calculation took 61ms for 8 columns
> INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
> 5.309650177091864 (just-counted was 5.309650177091864). calculation took
> 11ms for 170 columns
> INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 85070591730234615865843651857942052864 with IP: /10.0.0.153
> INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
> serialized/live bytes, 2 ops)
> INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
> bytes, 2 ops)
> INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
> (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
> position=17323736)
> INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
> (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address DC Rack Status State Load
> Owns Token
>
> 141784319550391032739561396922763706368
> 10.0.0.150 datacenter1 rack1 Up Normal 13.25 GB
> 16.67% 0
> 10.0.0.151 datacenter1 rack1 Up Normal 10.31 GB
> 16.67% 28356863910078203714492389662765613056
> 10.0.0.152 datacenter1 rack1 Up Normal 8.58 GB
> 16.67% 56713727820156407428984779325531226112
> 10.0.0.153 datacenter1 rack1 Up Normal 8.1 GB
> 16.67% 85070591730234615865843651857942052864
> 10.0.0.154 datacenter1 rack1 Up Normal 8.28 GB
> 16.67% 113427455640312814857969558651062452224
> 10.0.0.155 datacenter1 rack1 Up Normal 9.38 GB
> 16.67% 141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is
> dead, then does something to fix itself and comes back after (messing up
> performance in the meanwhile) .Does this makes any sense? Any clue of what
> might be going on? I can post more logs if you guys need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zz...@gmail.com>
>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com>
>> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have
>> more
>> > rows in your big CF. More rows could be due to imbalanced distribution
>> if
>> > your'e not using a random partitioner or from those nodes not yet
>> removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or
>> "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune
>> compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the
>> consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <gu...@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
>> )
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >> 5 cassNode0: GCInspector.java Heap
>> >> 5 cassNode0: StorageService.java Flushing
>> >> 232 cassNode2: GCInspector.java Heap
>> >> 232 cassNode2: StorageService.java Flushing
>> >> 104 cassNode3: GCInspector.java Heap
>> >> 104 cassNode3: StorageService.java Flushing
>> >> 3 cassNode4: GCInspector.java Heap
>> >> 3 cassNode4: StorageService.java Flushing
>> >> 3 cassNode5: GCInspector.java Heap
>> >> 3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 3 cassNode1: GCInspector.java Heap
>> >> 3 cassNode1: StorageService.java Flushing
>> >> 189 cassNode2: GCInspector.java Heap
>> >> 189 cassNode2: StorageService.java Flushing
>> >> 104 cassNode3: GCInspector.java Heap
>> >> 104 cassNode3: StorageService.java Flushing
>> >> 1 cassNode4: GCInspector.java Heap
>> >> 1 cassNode4: StorageService.java Flushing
>> >> 1 cassNode5: GCInspector.java Heap
>> >> 1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 1 cassNode1: GCInspector.java Heap
>> >> 1 cassNode1: StorageService.java Flushing
>> >> 1 cassNode2: AutoSavingCache.java Reducing
>> >> 343 cassNode2: GCInspector.java Heap
>> >> 342 cassNode2: StorageService.java Flushing
>> >> 181 cassNode3: GCInspector.java Heap
>> >> 181 cassNode3: StorageService.java Flushing
>> >> 4 cassNode4: GCInspector.java Heap
>> >> 4 cassNode4: StorageService.java Flushing
>> >> 3 cassNode5: GCInspector.java Heap
>> >> 3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >> 2 cassNode0: GCInspector.java Heap
>> >> 2 cassNode0: StorageService.java Flushing
>> >> 3 cassNode1: GCInspector.java Heap
>> >> 3 cassNode1: StorageService.java Flushing
>> >> 156 cassNode2: GCInspector.java Heap
>> >> 156 cassNode2: StorageService.java Flushing
>> >> 71 cassNode3: GCInspector.java Heap
>> >> 71 cassNode3: StorageService.java Flushing
>> >> 2 cassNode4: GCInspector.java Heap
>> >> 2 cassNode4: StorageService.java Flushing
>> >> 2 cassNode5: GCInspector.java Heap
>> >> 1 cassNode5: Memtable.java setting
>> >> 2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>>
>
>
>
> --
> <http://spotbros.com>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
> <http://www.facebook.com/spotbros> <http://twitter.com/spotbros>
>
>
--
Derek Williams
Re: too many warnings of Heap is full
Posted by Guillermo Barbero <gu...@spotbros.com>.
> What is the cardinality like on these indexes? Can you provide the
> schema creation for these two column families?
This is the schema of the CFs:
create column family CF_users
with comparator = UTF8Type
and column_metadata =
[
{column_name: userSBCode, validation_class:
UTF8Type, index_type: KEYS},
{column_name: userEmail, validation_class:
UTF8Type, index_type: KEYS},
{column_name: userName, validation_class: UTF8Type},
{column_name: userLastName, validation_class: UTF8Type},
{column_name: userOwnPhoneKey,
validation_class: UTF8Type, index_type: KEYS},
{column_name: userOwnPhone,
validation_class: UTF8Type, index_type: KEYS},
{column_name: userPasswordMD5,
validation_class: UTF8Type},
{column_name: userDOB, validation_class: UTF8Type},
{column_name: userGender, validation_class: UTF8Type},
{column_name: userProfilePicMD5,
validation_class: UTF8Type},
{column_name: userAbout, validation_class: UTF8Type},
{column_name: userLastSession,
validation_class: UTF8Type}
{column_name: userMasterKey, validation_class: UTF8Type}
];
create column family CF_SBMessages
with comparator = UTF8Type
and column_metadata =
[
{column_name: SBMessageId, validation_class:
UTF8Type, index_type: KEYS},
{column_name: fromSBCode, validation_class:
UTF8Type, index_type: KEYS},
{column_name: SBMessageDate, validation_class:
UTF8Type, index_type: KEYS},
{column_name: SBMessageType, validation_class:
UTF8Type},
{column_name: SBMessageText, validation_class:
UTF8Type},
{column_name: SBMessageAttachments,
validation_class: UTF8Type},
];
I've read about the importance of keeping the cardinality of the
secondary indexes low (great article at
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/),
and I'm afraid that we did completely the opposite (we did consider
the secondary indexes as alternate indexes).
I guess here is some work to do to create other CFs to store these
secondary indexes.
Anyway, I still don't understand why did appear these peaks (by the
way, last night there wasn't any)
Re: too many warnings of Heap is full
Posted by Nate McCall <zz...@gmail.com>.
>
> There are 4 secondary indexes for users and 3 for messages. I've seen
> something odd here: the index of the CFs is also stored in a column
> AND is also indexed. There isn't any query by that secondary index. I
> should probably erase the index, shouldn't I?
What is the cardinality like on these indexes? Can you provide the
schema creation for these two column families?
Re: too many warnings of Heap is full
Posted by Guillermo Barbero <gu...@spotbros.com>.
Hi, replies to your answers:
The CFs that seems to flush more often (users and messages) do not
have multiget queries...
There are 4 secondary indexes for users and 3 for messages. I've seen
something odd here: the index of the CFs is also stored in a column
AND is also indexed. There isn't any query by that secondary index. I
should probably erase the index, shouldn't I?
I post some lines of the log before 21:35:
Node 150:
INFO [HintedHandoff:1] 2013-01-30 21:07:43,795
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:07:43,796
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:17:43,796
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:17:43,796
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:27:43,796
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:27:43,797
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
Node 151:
INFO [HintedHandoff:1] 2013-01-30 21:04:45,821
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:04:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [ScheduledTasks:1] 2013-01-30 21:14:30,112 GCInspector.java
(line 122) GC for ParNew: 213 ms for 1 collections, 1920192480 used;
max is 3886022656
INFO [HintedHandoff:1] 2013-01-30 21:14:45,821
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:14:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:24:45,822
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:24:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [ScheduledTasks:1] 2013-01-30 21:28:25,294 GCInspector.java
(line 122) GC for ParNew: 236 ms for 1 collections, 2264010544 used;
max is 3886022656
INFO [HintedHandoff:1] 2013-01-30 21:34:45,822
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:34:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
Node 152:
INFO [HintedHandoff:1] 2013-01-30 21:02:03,706
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:02:03,707
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
WARN [MemoryMeter:1] 2013-01-30 21:09:57,220 Memtable.java (line 197)
setting live ratio to minimum of 1.0 instead of 0.15873949606408785
INFO [MemoryMeter:1] 2013-01-30 21:09:57,221 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_profilePics') liveRatio is
1.0 (just-counted was 1.0). calculation took 3ms for 301 columns
INFO [MemoryMeter:1] 2013-01-30 21:09:57,223 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_pPicMD5_idx') liveRatio is
18.952013237727524 (just-counted was 18.952013237727524). calculation
took 2ms for 74 columns
INFO [MemoryMeter:1] 2013-01-30 21:09:57,242 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_userSBCode_idx') liveRatio
is 12.820966340897906 (just-counted was 12.209597352454495).
calculation took 19ms for 74 columns
INFO [HintedHandoff:1] 2013-01-30 21:12:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:12:03,707
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [MemoryMeter:1] 2013-01-30 21:12:47,775 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_NewsFeedTimeline') liveRatio
is 3.605170679607766 (just-counted was 3.4883667978539283).
calculation took 110ms for 1462 columns
INFO [HintedHandoff:1] 2013-01-30 21:22:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:22:03,708
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [MemoryMeter:1] 2013-01-30 21:24:14,795 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSMembers') liveRatio is
5.785411930027831 (just-counted was 5.622629145582293). calculation
took 24ms for 463 columns
WARN [ScheduledTasks:1] 2013-01-30 21:28:12,744 GCInspector.java
(line 145) Heap is 0.7609735541387436 full. You may need to reduce
memtable and/or cache sizes. Cassandra will now flush up to the two
largest memtables to free up memory. Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-01-30 21:28:12,744 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD', ColumnFamily='CF_users')
to relieve memory pressure
INFO [ScheduledTasks:1] 2013-01-30 21:28:12,745
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userEmail_idx@1131307878(146/182
serialized/live bytes, 4 ops)
INFO [ScheduledTasks:1] 2013-01-30 21:28:12,755
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userOwnPhone_idx@973751772(146/182
serialized/live bytes, 4 ops)
INFO [ScheduledTasks:1] 2013-01-30 21:28:12,755
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userOwnPhoneKey_idx@2020641864(146/182
serialized/live bytes, 4 ops)
INFO [ScheduledTasks:1] 2013-01-30 21:28:12,756
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userSBCode_idx@621414590(146/182
serialized/live bytes, 4 ops)
INFO [ScheduledTasks:1] 2013-01-30 21:28:12,756
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users@902038769(9358482/45802966 serialized/live bytes,
405140 ops)
INFO [FlushWriter:11] 2013-01-30 21:28:12,757 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userEmail_idx@1131307878(146/182
serialized/live bytes, 4 ops)
INFO [FlushWriter:11] 2013-01-30 21:28:12,768 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userEmail_idx-hf-2231-Data.db
(234 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
INFO [FlushWriter:11] 2013-01-30 21:28:12,781 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userOwnPhone_idx@973751772(146/182
serialized/live bytes, 4 ops)
INFO [FlushWriter:11] 2013-01-30 21:28:12,793 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2217-Data.db
(265 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
INFO [FlushWriter:11] 2013-01-30 21:28:12,802 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userOwnPhoneKey_idx@2020641864(146/182
serialized/live bytes, 4 ops)
INFO [CompactionExecutor:42] 2013-01-30 21:28:12,813
CompactionTask.java (line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2217-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2215-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2216-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2214-Data.db')]
INFO [FlushWriter:11] 2013-01-30 21:28:12,814 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2209-Data.db
(268 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
INFO [FlushWriter:11] 2013-01-30 21:28:12,824 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userSBCode_idx@621414590(146/182
serialized/live bytes, 4 ops)
INFO [CompactionExecutor:43] 2013-01-30 21:28:12,826
CompactionTask.java (line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2206-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2208-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2209-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2207-Data.db')]
INFO [FlushWriter:11] 2013-01-30 21:28:12,835 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userSBCode_idx-hf-2205-Data.db
(263 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
INFO [FlushWriter:11] 2013-01-30 21:28:12,838 Memtable.java (line
264) Writing Memtable-CF_users@902038769(9358482/45802966
serialized/live bytes, 405140 ops)
INFO [FlushWriter:11] 2013-01-30 21:28:13,513 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users-hf-3989-Data.db
(3358487 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
INFO [CompactionExecutor:42] 2013-01-30 21:28:28,209
CompactionTask.java (line 221) Compacted to
[/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2218-Data.db,].
12,758,536 to 12,758,266 (~99% of original) bytes for 298,733 keys at
0.790337MB/s. Time: 15,395ms.
INFO [CompactionExecutor:43] 2013-01-30 21:28:28,258
CompactionTask.java (line 221) Compacted to
[/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2210-Data.db,].
12,827,751 to 12,827,331 (~99% of original) bytes for 298,767 keys at
0.792812MB/s. Time: 15,430ms.
INFO [MemoryMeter:1] 2013-01-30 21:31:30,638 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_userPhoneAgenda') liveRatio
is 9.966352958143455 (just-counted was 6.201877882350339).
calculation took 24ms for 6574 columns
INFO [HintedHandoff:1] 2013-01-30 21:32:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:32:03,708
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
WARN [ScheduledTasks:1] 2013-01-30 21:33:41,079 GCInspector.java
(line 145) Heap is 0.7632044026868381 full. You may need to reduce
memtable and/or cache sizes. Cassandra will now flush up to the two
largest memtables to free up memory. Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-01-30 21:33:41,079 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') to relieve memory pressure
INFO [ScheduledTasks:1] 2013-01-30 21:33:41,080
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_R2Notifications@1187465569(760532/33771575 serialized/live
bytes, 93136 ops)
INFO [FlushWriter:12] 2013-01-30 21:33:41,105 Memtable.java (line
264) Writing Memtable-CF_R2Notifications@1187465569(760532/33771575
serialized/live bytes, 93136 ops)
INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=11032499)
Node 153:
INFO [HintedHandoff:1] 2013-01-30 21:01:45,310
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:01:45,311
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
INFO [MemoryMeter:1] 2013-01-30 21:05:45,297 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_users') liveRatio is
3.9183034832767603 (just-counted was 3.9183034832767603). calculation
took 255ms for 58995 columns
INFO [MemoryMeter:1] 2013-01-30 21:07:24,736 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_NewsFeedTimeline') liveRatio
is 3.6493489644964034 (just-counted was 3.471672980060775).
calculation took 107ms for 1395 columns
INFO [ScheduledTasks:1] 2013-01-30 21:07:41,673 GCInspector.java
(line 122) GC for ParNew: 215 ms for 1 collections, 2310767440 used;
max is 3886022656
INFO [HintedHandoff:1] 2013-01-30 21:11:45,316
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:11:45,318
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
INFO [MemoryMeter:1] 2013-01-30 21:14:40,052 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_pPicMD5_idx') liveRatio is
18.87210642710637 (just-counted was 18.854822335025382). calculation
took 3ms for 80 columns
INFO [MemoryMeter:1] 2013-01-30 21:14:40,054 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_userSBCode_idx') liveRatio
is 12.252182911928061 (just-counted was 11.273365071410675).
calculation took 2ms for 81 columns
WARN [MemoryMeter:1] 2013-01-30 21:14:40,072 Memtable.java (line 197)
setting live ratio to minimum of 1.0 instead of 0.12200431775304404
INFO [MemoryMeter:1] 2013-01-30 21:14:40,072 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_profilePics') liveRatio is
1.0 (just-counted was 1.0). calculation took 3ms for 315 columns
INFO [ScheduledTasks:1] 2013-01-30 21:14:48,898 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 2512549552 used;
max is 3886022656
INFO [MemoryMeter:1] 2013-01-30 21:15:32,061 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_SBCodes') liveRatio is
32.58987080513604 (just-counted was 31.082180634662326). calculation
took 2ms for 52 columns
INFO [ScheduledTasks:1] 2013-01-30 21:21:08,322 GCInspector.java
(line 122) GC for ParNew: 239 ms for 1 collections, 2576069224 used;
max is 3886022656
INFO [HintedHandoff:1] 2013-01-30 21:21:45,311
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:21:45,312
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
INFO [MemoryMeter:1] 2013-01-30 21:23:46,635 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSData') liveRatio is
7.342488214798115 (just-counted was 7.342488214798115). calculation
took 0ms for 116 columns
INFO [MemoryMeter:1] 2013-01-30 21:24:14,793 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSMembers') liveRatio is
5.753869723542124 (just-counted was 5.630372102807786). calculation
took 24ms for 456 columns
INFO [ScheduledTasks:1] 2013-01-30 21:30:18,511 GCInspector.java
(line 122) GC for ParNew: 226 ms for 1 collections, 2242332984 used;
max is 3886022656
INFO [MemoryMeter:1] 2013-01-30 21:30:46,426 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R1Notifications') liveRatio is
6.832365653565714 (just-counted was 6.675185638944877). calculation
took 43ms for 860 columns
INFO [HintedHandoff:1] 2013-01-30 21:31:45,311
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:31:45,313
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
INFO [ScheduledTasks:1] 2013-01-30 21:33:12,105 GCInspector.java
(line 122) GC for ParNew: 261 ms for 1 collections, 2603400728 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:33:21,312 GCInspector.java
(line 122) GC for ParNew: 248 ms for 1 collections, 3191815000 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:33:23,315 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 3471315456 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:33:35,323 GCInspector.java
(line 122) GC for ParNew: 204 ms for 1 collections, 1636739048 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:33:51,462 GCInspector.java
(line 122) GC for ParNew: 239 ms for 1 collections, 2789318016 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:00,805 GCInspector.java
(line 122) GC for ParNew: 217 ms for 1 collections, 2959924968 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:05,008 GCInspector.java
(line 122) GC for ParNew: 217 ms for 1 collections, 2900952792 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:17,262 GCInspector.java
(line 122) GC for ParNew: 231 ms for 1 collections, 2368279416 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:19,267 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 2297887816 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:28,326 GCInspector.java
(line 122) GC for ParNew: 215 ms for 1 collections, 2892129512 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:30,328 GCInspector.java
(line 122) GC for ParNew: 266 ms for 1 collections, 3168051328 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:42,552 GCInspector.java
(line 122) GC for ParNew: 249 ms for 1 collections, 2330298920 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:34:46,554 GCInspector.java
(line 122) GC for ParNew: 243 ms for 1 collections, 2496880592 used;
max is 3886022656
WARN [ScheduledTasks:1] 2013-01-30 21:35:02,994 GCInspector.java
(line 145) Heap is 0.7641879728660028 full. You may need to reduce
memtable and/or cache sizes. Cassandra will now flush up to the two
largest memtables to free up memory. Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-01-30 21:35:02,995 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') to relieve memory pressure
INFO [ScheduledTasks:1] 2013-01-30 21:35:02,995
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_R2Notifications@1630759928(1137418/88800160
serialized/live bytes, 171909 ops)
INFO [FlushWriter:4] 2013-01-30 21:35:03,016 Memtable.java (line 264)
Writing Memtable-CF_R2Notifications@1630759928(1137418/88800160
serialized/live bytes, 171909 ops)
INFO [ScheduledTasks:1] 2013-01-30 21:35:04,293 GCInspector.java
(line 122) GC for ParNew: 455 ms for 2 collections, 2926515240 used;
max is 3886022656
INFO [FlushWriter:4] 2013-01-30 21:35:05,798 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-3832-Data.db
(790256 bytes) for commitlog position
ReplayPosition(segmentId=1359573697867, position=11748063)
INFO [ScheduledTasks:1] 2013-01-30 21:35:06,344 GCInspector.java
(line 122) GC for ParNew: 204 ms for 1 collections, 3227837984 used;
max is 3886022656
INFO [MemoryMeter:1] 2013-01-30 21:35:06,369 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is
37.62080440650366 (just-counted was 12.784246424797216). calculation
took 7181ms for 2683 columns
INFO [ScheduledTasks:1] 2013-01-30 21:35:08,345 GCInspector.java
(line 122) GC for ParNew: 216 ms for 1 collections, 3586092576 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:15,625 GCInspector.java
(line 122) GC for ConcurrentMarkSweep: 1130 ms for 2 collections,
602554696 used; max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:22,887 GCInspector.java
(line 122) GC for ParNew: 253 ms for 1 collections, 1362776544 used;
max is 3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:23,888 GCInspector.java
(line 122) GC for ParNew: 313 ms for 1 collections, 1473067952 used;
max is 3886022656
Node 154:
INFO [HintedHandoff:1] 2013-01-30 21:05:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:05:00,728
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:15:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:15:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [ScheduledTasks:1] 2013-01-30 21:21:28,248 GCInspector.java
(line 122) GC for ParNew: 356 ms for 1 collections, 1272387016 used;
max is 3886022656
INFO [HintedHandoff:1] 2013-01-30 21:25:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:25:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:35:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:35:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
Node 155:
INFO [HintedHandoff:1] 2013-01-30 21:05:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:05:07,917
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:15:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:15:07,917
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:25:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:25:07,918
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:35:07,917
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:35:07,918
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
I've read that the message 'Finished hinted handoff of 0 rows to
endpoint ...' is because a tombstone not being deleted, and that I
shouldn't worry about it.
I'm also talking with Amazon to check if there is a problem with their
service, the nodes 152 and 153 where using more cpu than the other
four nodes (but they have been like this all over the week, not just a
peak at the very same our for the last 3 days)
All the Cassandra configuration is at default values (the heap size,
row caches disabled, etc.)
Thanks for your support,
2013/1/31 Alain RODRIGUEZ <ar...@gmail.com>:
> @Bryan
>
> "Other space is used but can be more easily controlled by tuning for
> - memtable
> - compaction
> - key cache
> - row cache"
>
> Isn't row cache stored off-heap since a while ?
>
>
> 2013/1/31 Bryan Talbot <bt...@aeriagames.com>
>>
>> On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero
>> <gu...@spotbros.com> wrote:
>>>
>>> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
>>> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>>>
>>
>> This looks interesting. Doesn't this mean that the ratio of space used
>> for that CF for memory to serialized form is 751:1 but was forced to a lower
>> "sane" value?
>>
>> -Bryan
>>
>
--
Guillermo Barbero - Backend Team
Spotbros Technologies
Re: too many warnings of Heap is full
Posted by Alain RODRIGUEZ <ar...@gmail.com>.
@Bryan
"Other space is used but can be more easily controlled by tuning for
- memtable
- compaction
- key cache
- *row cache*"
Isn't row cache stored off-heap since a while ?
2013/1/31 Bryan Talbot <bt...@aeriagames.com>
> On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero <
> guillermo.barbero@spotbros.com> wrote:
>
>> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
>> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>>
>>
> This looks interesting. Doesn't this mean that the ratio of space used
> for that CF for memory to serialized form is 751:1 but was forced to a
> lower "sane" value?
>
> -Bryan
>
>
Re: too many warnings of Heap is full
Posted by Bryan Talbot <bt...@aeriagames.com>.
On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:
> WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>
>
This looks interesting. Doesn't this mean that the ratio of space used for
that CF for memory to serialized form is 751:1 but was forced to a lower
"sane" value?
-Bryan
Re: too many warnings of Heap is full
Posted by Guillermo Barbero <gu...@spotbros.com>.
Guys,
First thanks a lot for your answers.. now I have a little more info, lets
see if this helps:
As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
(Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
weeks we are seing that the performace of the cluster falls.. PHP commands
that usually lasted a few milliseconds last up to 15 seconds for just 2/3
minutes every now and then (no specific pattern) (see commands timing in
attached picture).
Today we have found this:
Prior to a fall in performance on all Cassandra nodes log I can see this:
*Node 150:*
INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1264710747(84317/476272
serialized/live bytes, 298 ops)
INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live
bytes, 298 ops)
INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
(84559 bytes) for commitlog position
ReplayPosition(segmentId=1355151995930, position=32659991)
INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
*Node 151:*
*
*
INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1453632632(93461/191418
serialized/live bytes, 316 ops)
INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live
bytes, 316 ops)
INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
(93711 bytes) for commitlog position
ReplayPosition(segmentId=1354832000013, position=20033362)
*Node 153:*
* *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
3886022656
INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.4195889775077013 (just-counted was 1.4195889775077013). calculation took
109ms for 64 columns
INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.5276925191259334 (just-counted was 1.5276925191259334). calculation took
11ms for 129 columns
INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.561312033901826 (just-counted was 1.561312033901826). calculation took
234ms for 144 columns
INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1120216043(96278/187900
serialized/live bytes, 288 ops)
*Node 154:*
*
*
INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
(line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
serialized/live bytes, 286 ops)
INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
bytes, 286 ops)
Node 155:
INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1083199446(94578/190654
serialized/live bytes, 316 ops)
*Meanwhile on node 152:*
INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=11032499)
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
3886022656
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
57) Pool Name Active Pending Blocked
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
72) ReadStage 32 343 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) RequestResponseStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) ReadRepairStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) MutationStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) ReplicateOnWriteStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) GossipStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) AntiEntropyStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) MigrationStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) StreamStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MemtablePostFlusher 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) FlushWriter 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MiscStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) commitlog_archiver 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) InternalResponseStage 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) HintedHandoff 0 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
77) CompactionManager 0 0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
89) MessagingService n/a 0,24
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
99) Cache Type Size Capacity
KeysToSave
Provider
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
100) KeyCache 133569 2184533
all
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
106) RowCache 0 0
all
org.apache.cassandra.cache.SerializingCacheProvider
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
113) ColumnFamily Memtable ops,data
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
116) system.NodeIdInfo 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.IndexInfo 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.LocationInfo 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.Versions 3,103
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.schema_keyspaces 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
116) system.Migrations 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
116) system.schema_columnfamilies 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.schema_columns 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.HintsColumnFamily 1,2369
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.Schema 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_mailPipes 9241,30380203
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_profilePics 427,1173642
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBVars 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBCounters 5,232
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
116) SB_NNCD.CF_userPhoneAgenda 7919,2332898
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_BSLUsers 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_mailTimeline 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_SBMessages 54390,15054201
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_NewsFeedTimeline 1866,7681861
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_BSLSpots 1,616
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBUniqueIds 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBChecks 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_NFTimelines 423,533156
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBAppUsersProfilePic 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBLikes 16,565
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBCodes 154,93735
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_attachmentsPerUser 80,107551
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_comments 58,77581
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_attachments 641,247796
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_SBStats 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_userReversedPhoneAgenda 9047,7132129
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_ACD.CF_GSMembers 823,1206779
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
116) SB_ACD.CF_NSLoad 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
116) SB_ACD.CF_sessions 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
116) SB_ACD.CF_userGS 193,577229
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_userApps 8,170
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_GSData 126,3503
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
116) SB_ACD.CF_R2Notifications 1906,58481
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
116) SB_ACD.CF_uniqueKeys 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
116) SB_ACD.CF_userContactWannabes 953,438671
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
116) SB_ACD.CF_SBApps 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
116) SB_ACD.CF_SBAppUsers 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_R1Notifications 1354,2070513
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_SBVotes 463,230386
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
116) SB_ACD.CF_userContacts 2968,4849519
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_sessionsSuspended 0,0
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_userBlockedList 5,207
INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
116) SB_ACD.CF_users 65386,10852776
WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145)
Heap is 0.7514124302624755 full. You may need to reduce memtable and/or
cache sizes. Cassandra will now flush up to the two largest memtables to
free up memory. Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
relieve memory pressure
INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
serialized/live bytes, 9380 ops)
INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
bytes, 9380 ops)
INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
(2049401 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=13578602)
WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
setting live ratio to maximum of 64.0 instead of 751.6512549537648
INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
(just-counted was 64.0). calculation took 61ms for 8 columns
INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
5.309650177091864 (just-counted was 5.309650177091864). calculation took
11ms for 170 columns
INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
serialized/live bytes, 2 ops)
INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
bytes, 2 ops)
INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
(1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
position=17323736)
INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
The cluster is perfectly balanced:
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC Rack Status State Load Owns
Token
141784319550391032739561396922763706368
10.0.0.150 datacenter1 rack1 Up Normal 13.25 GB
16.67% 0
10.0.0.151 datacenter1 rack1 Up Normal 10.31 GB
16.67% 28356863910078203714492389662765613056
10.0.0.152 datacenter1 rack1 Up Normal 8.58 GB
16.67% 56713727820156407428984779325531226112
10.0.0.153 datacenter1 rack1 Up Normal 8.1 GB
16.67% 85070591730234615865843651857942052864
10.0.0.154 datacenter1 rack1 Up Normal 8.28 GB
16.67% 113427455640312814857969558651062452224
10.0.0.155 datacenter1 rack1 Up Normal 9.38 GB
16.67% 141784319550391032739561396922763706368
I guess something happens on 152 that makes other nodes believe it is dead,
then does something to fix itself and comes back after (messing up
performance in the meanwhile) .Does this makes any sense? Any clue of what
might be going on? I can post more logs if you guys need
Thanks again!!
Guillermo
2013/1/30 Nate McCall <zz...@gmail.com>
> What's the output of nodetool cfstats for those 2 column families on
> cassNode2 and cassNode3? And what is the replication factor for this
> cluster?
>
> Per the previous reply, nodetool ring should show each of your nodes
> with ~16.7% of the data if well balanced.
>
> Also, the auto-detection for memory sizes in the startup script is a
> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
> ram. It usually ends up allocating 4g/400m (max and young) whereas
> 8g/800m will give you some more breathing room.
>
> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com>
> wrote:
> > My guess is that those one or two nodes with the gc pressure also have
> more
> > rows in your big CF. More rows could be due to imbalanced distribution
> if
> > your'e not using a random partitioner or from those nodes not yet
> removing
> > deleted rows which other nodes may have done.
> >
> > JVM heap space is used for a few things which scale with key count
> > including:
> > - bloom filter (for C* < 1.2)
> > - index samples
> >
> > Other space is used but can be more easily controlled by tuning for
> > - memtable
> > - compaction
> > - key cache
> > - row cache
> >
> >
> > So, if those nodes have more rows (check using "nodetool ring" or
> "nodetool
> > cfstats") than the others you can try to:
> > - reduce the number of rows by adding nodes, run manual / tune
> compactions
> > to remove rows with expired tombstones, etc.
> > - increase bloom filter fp chance
> > - increase jvm heap size (don't go too big)
> > - disable key or row cache
> > - increase index sample interval
> >
> > Not all of those things are generally good especially to the extreme so
> > don't go setting a 20 GB jvm heap without understanding the consequences
> for
> > example.
> >
> > -Bryan
> >
> >
> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> > <gu...@spotbros.com> wrote:
> >>
> >> Hi,
> >>
> >> I'm viewing a weird behaviour in my cassandra cluster. Most of the
> >> warning messages are due to Heap is % full. According to this link
> >>
> >> (
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
> )
> >> there are two ways to "reduce pressure":
> >> 1. Decrease the cache sizes
> >> 2. Increase the index interval size
> >>
> >> Most of the flushes are in two column families (users and messages), I
> >> guess that's because the most mutations are there.
> >>
> >> I still have not applied those changes to the production environment.
> >> Do you recommend any other meassure? Should I set specific tunning for
> >> these two CFs? Should I check another metric?
> >>
> >> Additionally, the distribution of warning messages is not uniform
> >> along the cluster. Why could cassandra be doing this? What should I do
> >> to find out how to fix this?
> >>
> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
> >> the java version is the following:
> >> java version "1.6.0_37"
> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
> >>
> >> The cassandra system.log is resumed here (numer of messages, cassandra
> >> node, class that reports the message, first word of the message)
> >> 2013-01-26
> >> 5 cassNode0: GCInspector.java Heap
> >> 5 cassNode0: StorageService.java Flushing
> >> 232 cassNode2: GCInspector.java Heap
> >> 232 cassNode2: StorageService.java Flushing
> >> 104 cassNode3: GCInspector.java Heap
> >> 104 cassNode3: StorageService.java Flushing
> >> 3 cassNode4: GCInspector.java Heap
> >> 3 cassNode4: StorageService.java Flushing
> >> 3 cassNode5: GCInspector.java Heap
> >> 3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-27
> >> 2 cassNode0: GCInspector.java Heap
> >> 2 cassNode0: StorageService.java Flushing
> >> 3 cassNode1: GCInspector.java Heap
> >> 3 cassNode1: StorageService.java Flushing
> >> 189 cassNode2: GCInspector.java Heap
> >> 189 cassNode2: StorageService.java Flushing
> >> 104 cassNode3: GCInspector.java Heap
> >> 104 cassNode3: StorageService.java Flushing
> >> 1 cassNode4: GCInspector.java Heap
> >> 1 cassNode4: StorageService.java Flushing
> >> 1 cassNode5: GCInspector.java Heap
> >> 1 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-28
> >> 2 cassNode0: GCInspector.java Heap
> >> 2 cassNode0: StorageService.java Flushing
> >> 1 cassNode1: GCInspector.java Heap
> >> 1 cassNode1: StorageService.java Flushing
> >> 1 cassNode2: AutoSavingCache.java Reducing
> >> 343 cassNode2: GCInspector.java Heap
> >> 342 cassNode2: StorageService.java Flushing
> >> 181 cassNode3: GCInspector.java Heap
> >> 181 cassNode3: StorageService.java Flushing
> >> 4 cassNode4: GCInspector.java Heap
> >> 4 cassNode4: StorageService.java Flushing
> >> 3 cassNode5: GCInspector.java Heap
> >> 3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-29
> >> 2 cassNode0: GCInspector.java Heap
> >> 2 cassNode0: StorageService.java Flushing
> >> 3 cassNode1: GCInspector.java Heap
> >> 3 cassNode1: StorageService.java Flushing
> >> 156 cassNode2: GCInspector.java Heap
> >> 156 cassNode2: StorageService.java Flushing
> >> 71 cassNode3: GCInspector.java Heap
> >> 71 cassNode3: StorageService.java Flushing
> >> 2 cassNode4: GCInspector.java Heap
> >> 2 cassNode4: StorageService.java Flushing
> >> 2 cassNode5: GCInspector.java Heap
> >> 1 cassNode5: Memtable.java setting
> >> 2 cassNode5: StorageService.java Flushing
> >>
> >> --
> >>
> >> Guillermo Barbero - Backend Team
> >>
> >> Spotbros Technologies
> >
> >
> >
>
--
<http://spotbros.com>
Guillermo Barbero - Backend Team
Spotbros Technologies
<http://www.facebook.com/spotbros> <http://twitter.com/spotbros>
Re: too many warnings of Heap is full
Posted by Nate McCall <zz...@gmail.com>.
What's the output of nodetool cfstats for those 2 column families on
cassNode2 and cassNode3? And what is the replication factor for this
cluster?
Per the previous reply, nodetool ring should show each of your nodes
with ~16.7% of the data if well balanced.
Also, the auto-detection for memory sizes in the startup script is a
little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
ram. It usually ends up allocating 4g/400m (max and young) whereas
8g/800m will give you some more breathing room.
On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
> My guess is that those one or two nodes with the gc pressure also have more
> rows in your big CF. More rows could be due to imbalanced distribution if
> your'e not using a random partitioner or from those nodes not yet removing
> deleted rows which other nodes may have done.
>
> JVM heap space is used for a few things which scale with key count
> including:
> - bloom filter (for C* < 1.2)
> - index samples
>
> Other space is used but can be more easily controlled by tuning for
> - memtable
> - compaction
> - key cache
> - row cache
>
>
> So, if those nodes have more rows (check using "nodetool ring" or "nodetool
> cfstats") than the others you can try to:
> - reduce the number of rows by adding nodes, run manual / tune compactions
> to remove rows with expired tombstones, etc.
> - increase bloom filter fp chance
> - increase jvm heap size (don't go too big)
> - disable key or row cache
> - increase index sample interval
>
> Not all of those things are generally good especially to the extreme so
> don't go setting a 20 GB jvm heap without understanding the consequences for
> example.
>
> -Bryan
>
>
> On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> <gu...@spotbros.com> wrote:
>>
>> Hi,
>>
>> I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> warning messages are due to Heap is % full. According to this link
>>
>> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> there are two ways to "reduce pressure":
>> 1. Decrease the cache sizes
>> 2. Increase the index interval size
>>
>> Most of the flushes are in two column families (users and messages), I
>> guess that's because the most mutations are there.
>>
>> I still have not applied those changes to the production environment.
>> Do you recommend any other meassure? Should I set specific tunning for
>> these two CFs? Should I check another metric?
>>
>> Additionally, the distribution of warning messages is not uniform
>> along the cluster. Why could cassandra be doing this? What should I do
>> to find out how to fix this?
>>
>> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> the java version is the following:
>> java version "1.6.0_37"
>> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>
>> The cassandra system.log is resumed here (numer of messages, cassandra
>> node, class that reports the message, first word of the message)
>> 2013-01-26
>> 5 cassNode0: GCInspector.java Heap
>> 5 cassNode0: StorageService.java Flushing
>> 232 cassNode2: GCInspector.java Heap
>> 232 cassNode2: StorageService.java Flushing
>> 104 cassNode3: GCInspector.java Heap
>> 104 cassNode3: StorageService.java Flushing
>> 3 cassNode4: GCInspector.java Heap
>> 3 cassNode4: StorageService.java Flushing
>> 3 cassNode5: GCInspector.java Heap
>> 3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-27
>> 2 cassNode0: GCInspector.java Heap
>> 2 cassNode0: StorageService.java Flushing
>> 3 cassNode1: GCInspector.java Heap
>> 3 cassNode1: StorageService.java Flushing
>> 189 cassNode2: GCInspector.java Heap
>> 189 cassNode2: StorageService.java Flushing
>> 104 cassNode3: GCInspector.java Heap
>> 104 cassNode3: StorageService.java Flushing
>> 1 cassNode4: GCInspector.java Heap
>> 1 cassNode4: StorageService.java Flushing
>> 1 cassNode5: GCInspector.java Heap
>> 1 cassNode5: StorageService.java Flushing
>>
>> 2013-01-28
>> 2 cassNode0: GCInspector.java Heap
>> 2 cassNode0: StorageService.java Flushing
>> 1 cassNode1: GCInspector.java Heap
>> 1 cassNode1: StorageService.java Flushing
>> 1 cassNode2: AutoSavingCache.java Reducing
>> 343 cassNode2: GCInspector.java Heap
>> 342 cassNode2: StorageService.java Flushing
>> 181 cassNode3: GCInspector.java Heap
>> 181 cassNode3: StorageService.java Flushing
>> 4 cassNode4: GCInspector.java Heap
>> 4 cassNode4: StorageService.java Flushing
>> 3 cassNode5: GCInspector.java Heap
>> 3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-29
>> 2 cassNode0: GCInspector.java Heap
>> 2 cassNode0: StorageService.java Flushing
>> 3 cassNode1: GCInspector.java Heap
>> 3 cassNode1: StorageService.java Flushing
>> 156 cassNode2: GCInspector.java Heap
>> 156 cassNode2: StorageService.java Flushing
>> 71 cassNode3: GCInspector.java Heap
>> 71 cassNode3: StorageService.java Flushing
>> 2 cassNode4: GCInspector.java Heap
>> 2 cassNode4: StorageService.java Flushing
>> 2 cassNode5: GCInspector.java Heap
>> 1 cassNode5: Memtable.java setting
>> 2 cassNode5: StorageService.java Flushing
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>
>
>
Re: too many warnings of Heap is full
Posted by Bryan Talbot <bt...@aeriagames.com>.
My guess is that those one or two nodes with the gc pressure also have more
rows in your big CF. More rows could be due to imbalanced distribution if
your'e not using a random partitioner or from those nodes not yet removing
deleted rows which other nodes may have done.
JVM heap space is used for a few things which scale with key count
including:
- bloom filter (for C* < 1.2)
- index samples
Other space is used but can be more easily controlled by tuning for
- memtable
- compaction
- key cache
- row cache
So, if those nodes have more rows (check using "nodetool ring" or "nodetool
cfstats") than the others you can try to:
- reduce the number of rows by adding nodes, run manual / tune compactions
to remove rows with expired tombstones, etc.
- increase bloom filter fp chance
- increase jvm heap size (don't go too big)
- disable key or row cache
- increase index sample interval
Not all of those things are generally good especially to the extreme so
don't go setting a 20 GB jvm heap without understanding
the consequences for example.
-Bryan
On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:
> Hi,
>
> I'm viewing a weird behaviour in my cassandra cluster. Most of the
> warning messages are due to Heap is % full. According to this link
> (
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
> )
> there are two ways to "reduce pressure":
> 1. Decrease the cache sizes
> 2. Increase the index interval size
>
> Most of the flushes are in two column families (users and messages), I
> guess that's because the most mutations are there.
>
> I still have not applied those changes to the production environment.
> Do you recommend any other meassure? Should I set specific tunning for
> these two CFs? Should I check another metric?
>
> Additionally, the distribution of warning messages is not uniform
> along the cluster. Why could cassandra be doing this? What should I do
> to find out how to fix this?
>
> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
> the java version is the following:
> java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>
> The cassandra system.log is resumed here (numer of messages, cassandra
> node, class that reports the message, first word of the message)
> 2013-01-26
> 5 cassNode0: GCInspector.java Heap
> 5 cassNode0: StorageService.java Flushing
> 232 cassNode2: GCInspector.java Heap
> 232 cassNode2: StorageService.java Flushing
> 104 cassNode3: GCInspector.java Heap
> 104 cassNode3: StorageService.java Flushing
> 3 cassNode4: GCInspector.java Heap
> 3 cassNode4: StorageService.java Flushing
> 3 cassNode5: GCInspector.java Heap
> 3 cassNode5: StorageService.java Flushing
>
> 2013-01-27
> 2 cassNode0: GCInspector.java Heap
> 2 cassNode0: StorageService.java Flushing
> 3 cassNode1: GCInspector.java Heap
> 3 cassNode1: StorageService.java Flushing
> 189 cassNode2: GCInspector.java Heap
> 189 cassNode2: StorageService.java Flushing
> 104 cassNode3: GCInspector.java Heap
> 104 cassNode3: StorageService.java Flushing
> 1 cassNode4: GCInspector.java Heap
> 1 cassNode4: StorageService.java Flushing
> 1 cassNode5: GCInspector.java Heap
> 1 cassNode5: StorageService.java Flushing
>
> 2013-01-28
> 2 cassNode0: GCInspector.java Heap
> 2 cassNode0: StorageService.java Flushing
> 1 cassNode1: GCInspector.java Heap
> 1 cassNode1: StorageService.java Flushing
> 1 cassNode2: AutoSavingCache.java Reducing
> 343 cassNode2: GCInspector.java Heap
> 342 cassNode2: StorageService.java Flushing
> 181 cassNode3: GCInspector.java Heap
> 181 cassNode3: StorageService.java Flushing
> 4 cassNode4: GCInspector.java Heap
> 4 cassNode4: StorageService.java Flushing
> 3 cassNode5: GCInspector.java Heap
> 3 cassNode5: StorageService.java Flushing
>
> 2013-01-29
> 2 cassNode0: GCInspector.java Heap
> 2 cassNode0: StorageService.java Flushing
> 3 cassNode1: GCInspector.java Heap
> 3 cassNode1: StorageService.java Flushing
> 156 cassNode2: GCInspector.java Heap
> 156 cassNode2: StorageService.java Flushing
> 71 cassNode3: GCInspector.java Heap
> 71 cassNode3: StorageService.java Flushing
> 2 cassNode4: GCInspector.java Heap
> 2 cassNode4: StorageService.java Flushing
> 2 cassNode5: GCInspector.java Heap
> 1 cassNode5: Memtable.java setting
> 2 cassNode5: StorageService.java Flushing
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
>