You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Guillermo Barbero <gu...@spotbros.com> on 2013/01/30 12:47:57 UTC

too many warnings of Heap is full

Hi,

  I'm viewing a weird behaviour in my cassandra cluster. Most of the
warning messages are due to Heap is % full. According to this link
(http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
there are two ways to "reduce pressure":
1. Decrease the cache sizes
2. Increase the index interval size

Most of the flushes are in two column families (users and messages), I
guess that's because the most mutations are there.

I still have not applied those changes to the production environment.
Do you recommend any other meassure? Should I set specific tunning for
these two CFs? Should I check another metric?

Additionally, the distribution of warning messages is not uniform
along the cluster. Why could cassandra be doing this? What should I do
to find out how to fix this?

cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
the java version is the following:
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)

The cassandra system.log is resumed here (numer of messages, cassandra
node, class that reports the message, first word of the message)
2013-01-26
      5 cassNode0: GCInspector.java Heap
      5 cassNode0: StorageService.java Flushing
    232 cassNode2: GCInspector.java Heap
    232 cassNode2: StorageService.java Flushing
    104 cassNode3: GCInspector.java Heap
    104 cassNode3: StorageService.java Flushing
      3 cassNode4: GCInspector.java Heap
      3 cassNode4: StorageService.java Flushing
      3 cassNode5: GCInspector.java Heap
      3 cassNode5: StorageService.java Flushing

2013-01-27
      2 cassNode0: GCInspector.java Heap
      2 cassNode0: StorageService.java Flushing
      3 cassNode1: GCInspector.java Heap
      3 cassNode1: StorageService.java Flushing
    189 cassNode2: GCInspector.java Heap
    189 cassNode2: StorageService.java Flushing
    104 cassNode3: GCInspector.java Heap
    104 cassNode3: StorageService.java Flushing
      1 cassNode4: GCInspector.java Heap
      1 cassNode4: StorageService.java Flushing
      1 cassNode5: GCInspector.java Heap
      1 cassNode5: StorageService.java Flushing

2013-01-28
      2 cassNode0: GCInspector.java Heap
      2 cassNode0: StorageService.java Flushing
      1 cassNode1: GCInspector.java Heap
      1 cassNode1: StorageService.java Flushing
      1 cassNode2: AutoSavingCache.java Reducing
    343 cassNode2: GCInspector.java Heap
    342 cassNode2: StorageService.java Flushing
    181 cassNode3: GCInspector.java Heap
    181 cassNode3: StorageService.java Flushing
      4 cassNode4: GCInspector.java Heap
      4 cassNode4: StorageService.java Flushing
      3 cassNode5: GCInspector.java Heap
      3 cassNode5: StorageService.java Flushing

2013-01-29
      2 cassNode0: GCInspector.java Heap
      2 cassNode0: StorageService.java Flushing
      3 cassNode1: GCInspector.java Heap
      3 cassNode1: StorageService.java Flushing
    156 cassNode2: GCInspector.java Heap
    156 cassNode2: StorageService.java Flushing
     71 cassNode3: GCInspector.java Heap
     71 cassNode3: StorageService.java Flushing
      2 cassNode4: GCInspector.java Heap
      2 cassNode4: StorageService.java Flushing
      2 cassNode5: GCInspector.java Heap
      1 cassNode5: Memtable.java setting
      2 cassNode5: StorageService.java Flushing

--

Guillermo Barbero - Backend Team

Spotbros Technologies

Re: too many warnings of Heap is full

Posted by Nate McCall <zz...@gmail.com>.
Your latencies and distribution look fine.

How big/what types of queries are you issuing? Are you issuing a lot
of large multigets?

Also, do either of these column families have secondary indexes?

On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero
<gu...@spotbros.com> wrote:
> Iep,
>
>   I missed the attachment...
>
>   Also, these are the cfstats of 152:
>
> Column Family: CF_SBMessages
> SSTable count: 3
> Space used (live): 967238560
> Space used (total): 967238560
> Number of Keys (estimate): 3263232
> Memtable Columns Count: 2112
> Memtable Data Size: 577135
> Memtable Switch Count: 5
> Read Count: 20702
> Read Latency: 0.181 ms.
> Write Count: 19888
> Write Latency: 0.059 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 14
> Bloom Filter False Ratio: 0.09375
> Bloom Filter Space Used: 6171168
> Compacted row minimum size: 536
> Compacted row maximum size: 24601
> Compacted row mean size: 850
>
> Column Family: CF_users
> SSTable count: 3
> Space used (live): 152205376
> Space used (total): 152205376
> Number of Keys (estimate): 343040
> Memtable Columns Count: 154398
> Memtable Data Size: 21159410
> Memtable Switch Count: 9
> Read Count: 11816
> Read Latency: 1.348 ms.
> Write Count: 128751
> Write Latency: 0.090 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 688464
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1235
>
> And the cfstats of 153:
>
> Column Family: CF_SBMessages
> SSTable count: 5
> Space used (live): 965495648
> Space used (total): 965495648
> Number of Keys (estimate): 3257216
> Memtable Columns Count: 56541
> Memtable Data Size: 15699960
> Memtable Switch Count: 2
> Read Count: 22475
> Read Latency: 0.142 ms.
> Write Count: 21719
> Write Latency: 0.073 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 43
> Bloom Filter False Ratio: 0.19545
> Bloom Filter Space Used: 6161032
> Compacted row minimum size: 536
> Compacted row maximum size: 11864
> Compacted row mean size: 872
>
> Column Family: CF_users
> SSTable count: 2
> Space used (live): 148762893
> Space used (total): 148762893
> Number of Keys (estimate): 333312
> Memtable Columns Count: 129725
> Memtable Data Size: 17144125
> Memtable Switch Count: 4
> Read Count: 7440
> Read Latency: 1.329 ms.
> Write Count: 127465
> Write Latency: 0.093 ms.
> Pending Tasks: 0
> Bloom Filter False Postives: 0
> Bloom Filter False Ratio: 0.00000
> Bloom Filter Space Used: 694112
> Compacted row minimum size: 61
> Compacted row maximum size: 3311
> Compacted row mean size: 1298
>
> The messages are in a keyspace with a replication factor of 2 and the
> users are in a keyspace with a replication factor of 3.
>
> Ah, and we use the RandomPartitioner.
>
> Thanks again
>
>
> 2013/1/30 Guillermo Barbero <gu...@spotbros.com>
>>
>> Guys,
>>
>> First thanks a lot for your answers.. now I have a little more info, lets see if this helps:
>>
>> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture).
>>
>> Today we have found this:
>>
>> Prior to a fall in performance on all Cassandra nodes log I can see this:
>>
>> Node 150:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
>>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>>
>> Node 151:
>>
>>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>>
>> Node 153:
>>
>>  INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took 109ms for 64 columns
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took 11ms for 129 columns
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826).  calculation took 234ms for 144 columns
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)
>>
>> Node 154:
>>
>> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>>
>>
>> Node 155:
>>
>> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)
>>
>> Meanwhile on node 152:
>>
>> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage                        32       343         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage              0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage                   0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage                  0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher               0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver                0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager                 0         0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService                n/a      0,24
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache                     133569                  2184533                      all
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily                Memtable ops,data
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo                          0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo                       0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions                         3,103
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces                   0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies                 0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns                     0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily               1,2369
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema                             0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes            9241,30380203
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics            427,1173642
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars                         0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters                   5,232
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers                       0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline                   0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages          54390,15054201
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots                     1,616
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds                    0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks                       0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines             423,533156
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes                     16,565
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes                  154,93735
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser           80,107551
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments                  58,77581
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments             641,247796
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats                        0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers               823,1206779
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad                          0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions                        0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS                   193,577229
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps                      8,170
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData                     126,3503
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications          1906,58481
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys                      0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes          953,438671
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps                          0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers                      0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications        1354,2070513
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes                  463,230386
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts           2968,4849519
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended                 0,0
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList               5,207
>>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users                65386,10852776
>> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
>>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
>>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio to maximum of 64.0 instead of 751.6512549537648
>>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0).  calculation took 61ms for 8 columns
>>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864).  calculation took 11ms for 170 columns
>>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296) Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
>>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
>>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>>
>> The cluster is perfectly balanced:
>>
>> Note: Ownership information does not include topology, please specify a keyspace.
>> Address         DC          Rack        Status State   Load            Owns                Token
>>                                                                                            141784319550391032739561396922763706368
>> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB        16.67%              0
>> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB        16.67%              28356863910078203714492389662765613056
>> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB         16.67%              56713727820156407428984779325531226112
>> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB          16.67%              85070591730234615865843651857942052864
>> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB         16.67%              113427455640312814857969558651062452224
>> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB         16.67%              141784319550391032739561396922763706368
>>
>>
>> I guess something happens on 152 that makes other nodes believe it is dead, then does something to fix itself and comes back after (messing up performance in the meanwhile) .Does this makes any sense? Any clue of what might be going on? I can post more logs if you guys need
>>
>> Thanks again!!
>>
>> Guillermo
>>
>>
>>
>> 2013/1/30 Nate McCall <zz...@gmail.com>
>>>
>>> What's the output of nodetool cfstats for those 2 column families on
>>> cassNode2 and cassNode3? And what is the replication factor for this
>>> cluster?
>>>
>>> Per the previous reply, nodetool ring should show each of your nodes
>>> with ~16.7% of the data if well balanced.
>>>
>>> Also, the auto-detection for memory sizes in the startup script is a
>>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>>> 8g/800m will give you some more breathing room.
>>>
>>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>>> > My guess is that those one or two nodes with the gc pressure also have more
>>> > rows in your big CF.  More rows could be due to imbalanced distribution if
>>> > your'e not using a random partitioner or from those nodes not yet removing
>>> > deleted rows which other nodes may have done.
>>> >
>>> > JVM heap space is used for a few things which scale with key count
>>> > including:
>>> > - bloom filter (for C* < 1.2)
>>> > - index samples
>>> >
>>> > Other space is used but can be more easily controlled by tuning for
>>> > - memtable
>>> > - compaction
>>> > - key cache
>>> > - row cache
>>> >
>>> >
>>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>>> > cfstats") than the others you can try to:
>>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>>> > to remove rows with expired tombstones, etc.
>>> > - increase bloom filter fp chance
>>> > - increase jvm heap size (don't go too big)
>>> > - disable key or row cache
>>> > - increase index sample interval
>>> >
>>> > Not all of those things are generally good especially to the extreme so
>>> > don't go setting a 20 GB jvm heap without understanding the consequences for
>>> > example.
>>> >
>>> > -Bryan
>>> >
>>> >
>>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>>> > <gu...@spotbros.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>>> >> warning messages are due to Heap is % full. According to this link
>>> >>
>>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>>> >> there are two ways to "reduce pressure":
>>> >> 1. Decrease the cache sizes
>>> >> 2. Increase the index interval size
>>> >>
>>> >> Most of the flushes are in two column families (users and messages), I
>>> >> guess that's because the most mutations are there.
>>> >>
>>> >> I still have not applied those changes to the production environment.
>>> >> Do you recommend any other meassure? Should I set specific tunning for
>>> >> these two CFs? Should I check another metric?
>>> >>
>>> >> Additionally, the distribution of warning messages is not uniform
>>> >> along the cluster. Why could cassandra be doing this? What should I do
>>> >> to find out how to fix this?
>>> >>
>>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>>> >> the java version is the following:
>>> >> java version "1.6.0_37"
>>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>> >>
>>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>>> >> node, class that reports the message, first word of the message)
>>> >> 2013-01-26
>>> >>       5 cassNode0: GCInspector.java Heap
>>> >>       5 cassNode0: StorageService.java Flushing
>>> >>     232 cassNode2: GCInspector.java Heap
>>> >>     232 cassNode2: StorageService.java Flushing
>>> >>     104 cassNode3: GCInspector.java Heap
>>> >>     104 cassNode3: StorageService.java Flushing
>>> >>       3 cassNode4: GCInspector.java Heap
>>> >>       3 cassNode4: StorageService.java Flushing
>>> >>       3 cassNode5: GCInspector.java Heap
>>> >>       3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-27
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       3 cassNode1: GCInspector.java Heap
>>> >>       3 cassNode1: StorageService.java Flushing
>>> >>     189 cassNode2: GCInspector.java Heap
>>> >>     189 cassNode2: StorageService.java Flushing
>>> >>     104 cassNode3: GCInspector.java Heap
>>> >>     104 cassNode3: StorageService.java Flushing
>>> >>       1 cassNode4: GCInspector.java Heap
>>> >>       1 cassNode4: StorageService.java Flushing
>>> >>       1 cassNode5: GCInspector.java Heap
>>> >>       1 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-28
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       1 cassNode1: GCInspector.java Heap
>>> >>       1 cassNode1: StorageService.java Flushing
>>> >>       1 cassNode2: AutoSavingCache.java Reducing
>>> >>     343 cassNode2: GCInspector.java Heap
>>> >>     342 cassNode2: StorageService.java Flushing
>>> >>     181 cassNode3: GCInspector.java Heap
>>> >>     181 cassNode3: StorageService.java Flushing
>>> >>       4 cassNode4: GCInspector.java Heap
>>> >>       4 cassNode4: StorageService.java Flushing
>>> >>       3 cassNode5: GCInspector.java Heap
>>> >>       3 cassNode5: StorageService.java Flushing
>>> >>
>>> >> 2013-01-29
>>> >>       2 cassNode0: GCInspector.java Heap
>>> >>       2 cassNode0: StorageService.java Flushing
>>> >>       3 cassNode1: GCInspector.java Heap
>>> >>       3 cassNode1: StorageService.java Flushing
>>> >>     156 cassNode2: GCInspector.java Heap
>>> >>     156 cassNode2: StorageService.java Flushing
>>> >>      71 cassNode3: GCInspector.java Heap
>>> >>      71 cassNode3: StorageService.java Flushing
>>> >>       2 cassNode4: GCInspector.java Heap
>>> >>       2 cassNode4: StorageService.java Flushing
>>> >>       2 cassNode5: GCInspector.java Heap
>>> >>       1 cassNode5: Memtable.java setting
>>> >>       2 cassNode5: StorageService.java Flushing
>>> >>
>>> >> --
>>> >>
>>> >> Guillermo Barbero - Backend Team
>>> >>
>>> >> Spotbros Technologies
>>> >
>>> >
>>> >
>>
>>
>>
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>>
>>
>>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies

Re: too many warnings of Heap is full

Posted by Guillermo Barbero <gu...@spotbros.com>.
Iep,

  I missed the attachment...

  Also, these are the cfstats of 152:

Column Family: CF_SBMessages
SSTable count: 3
Space used (live): 967238560
Space used (total): 967238560
Number of Keys (estimate): 3263232
Memtable Columns Count: 2112
Memtable Data Size: 577135
Memtable Switch Count: 5
Read Count: 20702
Read Latency: 0.181 ms.
Write Count: 19888
Write Latency: 0.059 ms.
Pending Tasks: 0
Bloom Filter False Postives: 14
Bloom Filter False Ratio: 0.09375
Bloom Filter Space Used: 6171168
Compacted row minimum size: 536
Compacted row maximum size: 24601
Compacted row mean size: 850

Column Family: CF_users
SSTable count: 3
Space used (live): 152205376
Space used (total): 152205376
Number of Keys (estimate): 343040
Memtable Columns Count: 154398
Memtable Data Size: 21159410
Memtable Switch Count: 9
Read Count: 11816
Read Latency: 1.348 ms.
Write Count: 128751
Write Latency: 0.090 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 688464
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1235

And the cfstats of 153:

Column Family: CF_SBMessages
SSTable count: 5
Space used (live): 965495648
Space used (total): 965495648
Number of Keys (estimate): 3257216
Memtable Columns Count: 56541
Memtable Data Size: 15699960
Memtable Switch Count: 2
Read Count: 22475
Read Latency: 0.142 ms.
Write Count: 21719
Write Latency: 0.073 ms.
Pending Tasks: 0
Bloom Filter False Postives: 43
Bloom Filter False Ratio: 0.19545
Bloom Filter Space Used: 6161032
Compacted row minimum size: 536
Compacted row maximum size: 11864
Compacted row mean size: 872

Column Family: CF_users
SSTable count: 2
Space used (live): 148762893
Space used (total): 148762893
Number of Keys (estimate): 333312
Memtable Columns Count: 129725
Memtable Data Size: 17144125
Memtable Switch Count: 4
Read Count: 7440
Read Latency: 1.329 ms.
Write Count: 127465
Write Latency: 0.093 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 694112
Compacted row minimum size: 61
Compacted row maximum size: 3311
Compacted row mean size: 1298

The messages are in a keyspace with a replication factor of 2 and the
users are in a keyspace with a replication factor of 3.

Ah, and we use the RandomPartitioner.

Thanks again


2013/1/30 Guillermo Barbero <gu...@spotbros.com>
>
> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets see if this helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> Node 150:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991)
>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> Node 151:
>
>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db (93711 bytes) for commitlog position ReplayPosition(segmentId=1354832000013, position=20033362)
>
> Node 153:
>
>  INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is 3886022656
>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took 109ms for 64 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took 11ms for 129 columns
>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213) CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 1.561312033901826 (just-counted was 1.561312033901826).  calculation took 234ms for 144 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900 serialized/live bytes, 288 ops)
>
> Node 154:
>
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654 serialized/live bytes, 316 ops)
>
> Meanwhile on node 152:
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db (639626 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=11032499)
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line 57) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line 72) ReadStage                        32       343         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) RequestResponseStage              0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line 72) MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) GossipStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line 72) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line 72) StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line 72) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) commitlog_archiver                0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line 72) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 77) CompactionManager                 0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 89) MessagingService                n/a      0,24
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line 99) Cache Type                     Size                 Capacity               KeysToSave                                                         Provider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 100) KeyCache                     133569                  2184533                      all
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 106) RowCache                          0                        0                      all              org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 113) ColumnFamily                Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line 116) system.NodeIdInfo                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.IndexInfo                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.LocationInfo                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.Versions                         3,103
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line 116) system.schema_keyspaces                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line 116) system.Migrations                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line 116) system.schema_columnfamilies                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.schema_columns                     0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.HintsColumnFamily               1,2369
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line 116) system.Schema                             0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_mailPipes            9241,30380203
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_profilePics            427,1173642
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBVars                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line 116) SB_NNCD.CF_SBCounters                   5,232
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line 116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_BSLUsers                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_mailTimeline                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_SBMessages          54390,15054201
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line 116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_BSLSpots                     1,616
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBUniqueIds                    0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBChecks                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_NFTimelines             423,533156
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line 116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBLikes                     16,565
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_SBCodes                  154,93735
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_attachmentsPerUser           80,107551
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line 116) SB_NNCD.CF_comments                  58,77581
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_attachments             641,247796
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_SBStats                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line 116) SB_ACD.CF_GSMembers               823,1206779
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line 116) SB_ACD.CF_NSLoad                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line 116) SB_ACD.CF_sessions                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line 116) SB_ACD.CF_userGS                   193,577229
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_userApps                      8,170
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line 116) SB_ACD.CF_GSData                     126,3503
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line 116) SB_ACD.CF_R2Notifications          1906,58481
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line 116) SB_ACD.CF_uniqueKeys                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line 116) SB_ACD.CF_userContactWannabes          953,438671
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line 116) SB_ACD.CF_SBApps                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line 116) SB_ACD.CF_SBAppUsers                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_R1Notifications        1354,2070513
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line 116) SB_ACD.CF_SBVotes                  463,230386
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line 116) SB_ACD.CF_userContacts           2968,4849519
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_sessionsSuspended                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line 116) SB_ACD.CF_userBlockedList               5,207
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line 116) SB_ACD.CF_users                65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145) Heap is 0.7514124302624755 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to relieve memory pressure
>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264) Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db (2049401 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=13578602)
>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202) setting live ratio to maximum of 64.0 instead of 751.6512549537648
>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0 (just-counted was 64.0).  calculation took 61ms for 8 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213) CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is 5.309650177091864 (just-counted was 5.309650177091864).  calculation took 11ms for 170 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java (line 296) Started hinted handoff for token: 85070591730234615865843651857942052864 with IP: /10.0.0.153
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280, position=17323736)
>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a keyspace.
> Address         DC          Rack        Status State   Load            Owns                Token
>                                                                                            141784319550391032739561396922763706368
> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB        16.67%              0
> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB        16.67%              28356863910078203714492389662765613056
> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB         16.67%              56713727820156407428984779325531226112
> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB          16.67%              85070591730234615865843651857942052864
> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB         16.67%              113427455640312814857969558651062452224
> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB         16.67%              141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is dead, then does something to fix itself and comes back after (messing up performance in the meanwhile) .Does this makes any sense? Any clue of what might be going on? I can post more logs if you guys need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zz...@gmail.com>
>>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have more
>> > rows in your big CF.  More rows could be due to imbalanced distribution if
>> > your'e not using a random partitioner or from those nodes not yet removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <gu...@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >>       5 cassNode0: GCInspector.java Heap
>> >>       5 cassNode0: StorageService.java Flushing
>> >>     232 cassNode2: GCInspector.java Heap
>> >>     232 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       3 cassNode4: GCInspector.java Heap
>> >>       3 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     189 cassNode2: GCInspector.java Heap
>> >>     189 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       1 cassNode4: GCInspector.java Heap
>> >>       1 cassNode4: StorageService.java Flushing
>> >>       1 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       1 cassNode1: GCInspector.java Heap
>> >>       1 cassNode1: StorageService.java Flushing
>> >>       1 cassNode2: AutoSavingCache.java Reducing
>> >>     343 cassNode2: GCInspector.java Heap
>> >>     342 cassNode2: StorageService.java Flushing
>> >>     181 cassNode3: GCInspector.java Heap
>> >>     181 cassNode3: StorageService.java Flushing
>> >>       4 cassNode4: GCInspector.java Heap
>> >>       4 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     156 cassNode2: GCInspector.java Heap
>> >>     156 cassNode2: StorageService.java Flushing
>> >>      71 cassNode3: GCInspector.java Heap
>> >>      71 cassNode3: StorageService.java Flushing
>> >>       2 cassNode4: GCInspector.java Heap
>> >>       2 cassNode4: StorageService.java Flushing
>> >>       2 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: Memtable.java setting
>> >>       2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>
>
>
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
>
>
>



--

Guillermo Barbero - Backend Team

Spotbros Technologies

Re: too many warnings of Heap is full

Posted by Derek Williams <de...@fyrie.net>.
My first guess is the 3 ParNew collections taking 35 seconds is why your
clients had long response times. That is a very abnormal stat. Combined
with the low memory usage at that time, it seems highly unlikely there was
actually 35 seconds worth of GCing going on there. Unless there is anything
interesting in the lines above that snippet.

I'd first double check that your new generation hasn't
been accidentally set to something insanely large or small, just to rule
that out (but judging from the 4G of heap that is set you are most likely
using defaults, which is good). After that I'd try replacing that node as
it could be a hardware problem.

Another reason I have seen large pauses on a linux VPS that might be worth
looking into is low entropy (possibly due to someone else abusing your
shared hardware).


On Wed, Jan 30, 2013 at 3:44 PM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:

> Guys,
>
> First thanks a lot for your answers.. now I have a little more info, lets
> see if this helps:
>
> As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
> (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
> weeks we are seing that the performace of the cluster falls.. PHP commands
> that usually lasted a few milliseconds last up to 15 seconds for just 2/3
> minutes every now and then (no specific pattern) (see commands timing in
> attached picture).
>
> Today we have found this:
>
> Prior to a fall in performance on all Cassandra nodes log I can see this:
>
> *Node 150:*
>
> INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1264710747(84317/476272
> serialized/live bytes, 298 ops)
>  INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
> (84559 bytes) for commitlog position
> ReplayPosition(segmentId=1355151995930, position=32659991)
>  INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283
> CompactionTask.java (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]
>
> *Node 151:*
> *
> *
>  INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1453632632(93461/191418
> serialized/live bytes, 316 ops)
>  INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
> (93711 bytes) for commitlog position
> ReplayPosition(segmentId=1354832000013, position=20033362)
>
> *Node 153:*
>
> * *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
> 122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
> 122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
> 122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
> 3886022656
>  INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took
> 109ms for 64 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took
> 11ms for 129 columns
>  INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
> CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
> 1.561312033901826 (just-counted was 1.561312033901826).  calculation took
> 234ms for 144 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1120216043(96278/187900
> serialized/live bytes, 288 ops)
>
> *Node 154:*
> *
> *
> INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
> (line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
>  INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
> serialized/live bytes, 286 ops)
>  INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
> bytes, 286 ops)
>
>
> Node 155:
>
> INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
> InetAddress /10.0.0.152 is now dead.
>  INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
> InetAddress /10.0.0.152 is now UP
>  INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 56713727820156407428984779325531226112 with IP: /10.0.0.152
>  INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1083199446(94578/190654
> serialized/live bytes, 316 ops)
>
> *Meanwhile on node 152:*
>
> INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
> (639626 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=11032499)
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
> 122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
> 3886022656
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
> 57) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
> 72) ReadStage                        32       343         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) RequestResponseStage              0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
> 72) MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) GossipStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
> 72) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
> 72) StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
> 72) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) commitlog_archiver                0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
> 72) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 77) CompactionManager                 0         0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 89) MessagingService                n/a      0,24
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
> 99) Cache Type                     Size                 Capacity
>     KeysToSave
> Provider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 100) KeyCache                     133569                  2184533
>            all
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 106) RowCache                          0                        0
>            all
>  org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 113) ColumnFamily                Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
> 116) system.NodeIdInfo                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.IndexInfo                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.LocationInfo                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.Versions                         3,103
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
> 116) system.schema_keyspaces                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
> 116) system.Migrations                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
> 116) system.schema_columnfamilies                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.schema_columns                     0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.HintsColumnFamily               1,2369
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
> 116) system.Schema                             0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_mailPipes            9241,30380203
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_profilePics            427,1173642
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBVars                         0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCounters                   5,232
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
> 116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLUsers                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_mailTimeline                   0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_SBMessages          54390,15054201
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
> 116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_BSLSpots                     1,616
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBUniqueIds                    0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBChecks                       0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_NFTimelines             423,533156
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
> 116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBLikes                     16,565
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_SBCodes                  154,93735
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_attachmentsPerUser           80,107551
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
> 116) SB_NNCD.CF_comments                  58,77581
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_attachments             641,247796
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_SBStats                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
> 116) SB_ACD.CF_GSMembers               823,1206779
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
> 116) SB_ACD.CF_NSLoad                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
> 116) SB_ACD.CF_sessions                        0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
> 116) SB_ACD.CF_userGS                   193,577229
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_userApps                      8,170
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
> 116) SB_ACD.CF_GSData                     126,3503
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
> 116) SB_ACD.CF_R2Notifications          1906,58481
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
> 116) SB_ACD.CF_uniqueKeys                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
> 116) SB_ACD.CF_userContactWannabes          953,438671
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
> 116) SB_ACD.CF_SBApps                          0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
> 116) SB_ACD.CF_SBAppUsers                      0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_R1Notifications        1354,2070513
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
> 116) SB_ACD.CF_SBVotes                  463,230386
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
> 116) SB_ACD.CF_userContacts           2968,4849519
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_sessionsSuspended                 0,0
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
> 116) SB_ACD.CF_userBlockedList               5,207
>  INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
> 116) SB_ACD.CF_users                65386,10852776
> WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line
> 145) Heap is 0.7514124302624755 full.  You may need to reduce memtable
> and/or cache sizes.  Cassandra will now flush up to the two largest
> memtables to free up memory.  Adjust flush_largest_memtables_at threshold
> in cassandra.yaml if you don't want Cassandra to do this automatically
>  WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
> 2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
> relieve memory pressure
>  INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
> serialized/live bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
> Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
> bytes, 9380 ops)
>  INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
> (2049401 bytes) for commitlog position
> ReplayPosition(segmentId=1359573115280, position=13578602)
>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>  INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
> (just-counted was 64.0).  calculation took 61ms for 8 columns
>  INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
> CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
> 5.309650177091864 (just-counted was 5.309650177091864).  calculation took
> 11ms for 170 columns
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
> (line 296) Started hinted handoff for token:
> 85070591730234615865843651857942052864 with IP: /10.0.0.153
>  INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
> (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
> serialized/live bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
> Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
> bytes, 2 ops)
>  INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
> Completed flushing
> /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
> (1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
> position=17323736)
>  INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
> (line 109) Compacting
> [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
> SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]
>
> The cluster is perfectly balanced:
>
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address         DC          Rack        Status State   Load
>  Owns                Token
>
>                  141784319550391032739561396922763706368
> 10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB
>  16.67%              0
> 10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB
>  16.67%              28356863910078203714492389662765613056
> 10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB
> 16.67%              56713727820156407428984779325531226112
> 10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB
>  16.67%              85070591730234615865843651857942052864
> 10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB
> 16.67%              113427455640312814857969558651062452224
> 10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB
> 16.67%              141784319550391032739561396922763706368
>
>
> I guess something happens on 152 that makes other nodes believe it is
> dead, then does something to fix itself and comes back after (messing up
> performance in the meanwhile) .Does this makes any sense? Any clue of what
> might be going on? I can post more logs if you guys need
>
> Thanks again!!
>
> Guillermo
>
>
>
> 2013/1/30 Nate McCall <zz...@gmail.com>
>
>> What's the output of nodetool cfstats for those 2 column families on
>> cassNode2 and cassNode3? And what is the replication factor for this
>> cluster?
>>
>> Per the previous reply, nodetool ring should show each of your nodes
>> with ~16.7% of the data if well balanced.
>>
>> Also, the auto-detection for memory sizes in the startup script is a
>> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
>> ram. It usually ends up allocating 4g/400m (max and young) whereas
>> 8g/800m will give you some more breathing room.
>>
>> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com>
>> wrote:
>> > My guess is that those one or two nodes with the gc pressure also have
>> more
>> > rows in your big CF.  More rows could be due to imbalanced distribution
>> if
>> > your'e not using a random partitioner or from those nodes not yet
>> removing
>> > deleted rows which other nodes may have done.
>> >
>> > JVM heap space is used for a few things which scale with key count
>> > including:
>> > - bloom filter (for C* < 1.2)
>> > - index samples
>> >
>> > Other space is used but can be more easily controlled by tuning for
>> > - memtable
>> > - compaction
>> > - key cache
>> > - row cache
>> >
>> >
>> > So, if those nodes have more rows (check using "nodetool ring" or
>> "nodetool
>> > cfstats") than the others you can try to:
>> > - reduce the number of rows by adding nodes, run manual / tune
>> compactions
>> > to remove rows with expired tombstones, etc.
>> > - increase bloom filter fp chance
>> > - increase jvm heap size (don't go too big)
>> > - disable key or row cache
>> > - increase index sample interval
>> >
>> > Not all of those things are generally good especially to the extreme so
>> > don't go setting a 20 GB jvm heap without understanding the
>> consequences for
>> > example.
>> >
>> > -Bryan
>> >
>> >
>> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
>> > <gu...@spotbros.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> >> warning messages are due to Heap is % full. According to this link
>> >>
>> >> (
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
>> )
>> >> there are two ways to "reduce pressure":
>> >> 1. Decrease the cache sizes
>> >> 2. Increase the index interval size
>> >>
>> >> Most of the flushes are in two column families (users and messages), I
>> >> guess that's because the most mutations are there.
>> >>
>> >> I still have not applied those changes to the production environment.
>> >> Do you recommend any other meassure? Should I set specific tunning for
>> >> these two CFs? Should I check another metric?
>> >>
>> >> Additionally, the distribution of warning messages is not uniform
>> >> along the cluster. Why could cassandra be doing this? What should I do
>> >> to find out how to fix this?
>> >>
>> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> >> the java version is the following:
>> >> java version "1.6.0_37"
>> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>> >>
>> >> The cassandra system.log is resumed here (numer of messages, cassandra
>> >> node, class that reports the message, first word of the message)
>> >> 2013-01-26
>> >>       5 cassNode0: GCInspector.java Heap
>> >>       5 cassNode0: StorageService.java Flushing
>> >>     232 cassNode2: GCInspector.java Heap
>> >>     232 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       3 cassNode4: GCInspector.java Heap
>> >>       3 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-27
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     189 cassNode2: GCInspector.java Heap
>> >>     189 cassNode2: StorageService.java Flushing
>> >>     104 cassNode3: GCInspector.java Heap
>> >>     104 cassNode3: StorageService.java Flushing
>> >>       1 cassNode4: GCInspector.java Heap
>> >>       1 cassNode4: StorageService.java Flushing
>> >>       1 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-28
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       1 cassNode1: GCInspector.java Heap
>> >>       1 cassNode1: StorageService.java Flushing
>> >>       1 cassNode2: AutoSavingCache.java Reducing
>> >>     343 cassNode2: GCInspector.java Heap
>> >>     342 cassNode2: StorageService.java Flushing
>> >>     181 cassNode3: GCInspector.java Heap
>> >>     181 cassNode3: StorageService.java Flushing
>> >>       4 cassNode4: GCInspector.java Heap
>> >>       4 cassNode4: StorageService.java Flushing
>> >>       3 cassNode5: GCInspector.java Heap
>> >>       3 cassNode5: StorageService.java Flushing
>> >>
>> >> 2013-01-29
>> >>       2 cassNode0: GCInspector.java Heap
>> >>       2 cassNode0: StorageService.java Flushing
>> >>       3 cassNode1: GCInspector.java Heap
>> >>       3 cassNode1: StorageService.java Flushing
>> >>     156 cassNode2: GCInspector.java Heap
>> >>     156 cassNode2: StorageService.java Flushing
>> >>      71 cassNode3: GCInspector.java Heap
>> >>      71 cassNode3: StorageService.java Flushing
>> >>       2 cassNode4: GCInspector.java Heap
>> >>       2 cassNode4: StorageService.java Flushing
>> >>       2 cassNode5: GCInspector.java Heap
>> >>       1 cassNode5: Memtable.java setting
>> >>       2 cassNode5: StorageService.java Flushing
>> >>
>> >> --
>> >>
>> >> Guillermo Barbero - Backend Team
>> >>
>> >> Spotbros Technologies
>> >
>> >
>> >
>>
>
>
>
> --
> <http://spotbros.com>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
> <http://www.facebook.com/spotbros>  <http://twitter.com/spotbros>
>
>


-- 
Derek Williams

Re: too many warnings of Heap is full

Posted by Guillermo Barbero <gu...@spotbros.com>.
> What is the cardinality like on these indexes? Can you provide the
> schema creation for these two column families?

This is the schema of the CFs:

create column family CF_users
                with comparator = UTF8Type
                and column_metadata =
                [
                        {column_name: userSBCode, validation_class:
UTF8Type, index_type: KEYS},
                        {column_name: userEmail, validation_class:
UTF8Type, index_type: KEYS},
                        {column_name: userName, validation_class: UTF8Type},
                        {column_name: userLastName, validation_class: UTF8Type},
                        {column_name: userOwnPhoneKey,
validation_class: UTF8Type, index_type: KEYS},
                        {column_name: userOwnPhone,
validation_class: UTF8Type, index_type: KEYS},
                        {column_name: userPasswordMD5,
validation_class: UTF8Type},
                        {column_name: userDOB, validation_class: UTF8Type},
                        {column_name: userGender, validation_class: UTF8Type},
                        {column_name: userProfilePicMD5,
validation_class: UTF8Type},
                        {column_name: userAbout, validation_class: UTF8Type},
                        {column_name: userLastSession,
validation_class: UTF8Type}
                        {column_name: userMasterKey, validation_class: UTF8Type}
                ];

create column family CF_SBMessages
                with comparator = UTF8Type
                and column_metadata =
                [

                        {column_name: SBMessageId, validation_class:
UTF8Type, index_type: KEYS},
                        {column_name: fromSBCode, validation_class:
UTF8Type, index_type: KEYS},
                        {column_name: SBMessageDate, validation_class:
UTF8Type, index_type: KEYS},
                        {column_name: SBMessageType, validation_class:
UTF8Type},
                        {column_name: SBMessageText, validation_class:
UTF8Type},
                        {column_name: SBMessageAttachments,
validation_class: UTF8Type},
                ];


I've read about the importance of keeping the cardinality of the
secondary indexes low (great article at
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/),
and I'm afraid that we did completely the opposite (we did consider
the secondary indexes as alternate indexes).
I guess here is some work to do to create other CFs to store these
secondary indexes.

Anyway, I still don't understand why did appear these peaks (by the
way, last night there wasn't any)

Re: too many warnings of Heap is full

Posted by Nate McCall <zz...@gmail.com>.
>
> There are 4 secondary indexes for users and 3 for messages. I've seen
> something odd here: the index of the CFs is also stored in a column
> AND is also indexed. There isn't any query by that secondary index. I
> should probably erase the index, shouldn't I?

What is the cardinality like on these indexes? Can you provide the
schema creation for these two column families?

Re: too many warnings of Heap is full

Posted by Guillermo Barbero <gu...@spotbros.com>.
Hi, replies to your answers:

The CFs that seems to flush more often (users and messages) do not
have multiget queries...

There are 4 secondary indexes for users and 3 for messages. I've seen
something odd here: the index of the CFs is also stored in a column
AND is also indexed. There isn't any query by that secondary index. I
should probably erase the index, shouldn't I?

I post some lines of the log before 21:35:

Node 150:
 INFO [HintedHandoff:1] 2013-01-30 21:07:43,795
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:07:43,796
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:17:43,796
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:17:43,796
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:27:43,796
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:27:43,797
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153

Node 151:
 INFO [HintedHandoff:1] 2013-01-30 21:04:45,821
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:04:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [ScheduledTasks:1] 2013-01-30 21:14:30,112 GCInspector.java
(line 122) GC for ParNew: 213 ms for 1 collections, 1920192480 used;
max is 3886022656
 INFO [HintedHandoff:1] 2013-01-30 21:14:45,821
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:14:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:24:45,822
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:24:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [ScheduledTasks:1] 2013-01-30 21:28:25,294 GCInspector.java
(line 122) GC for ParNew: 236 ms for 1 collections, 2264010544 used;
max is 3886022656
 INFO [HintedHandoff:1] 2013-01-30 21:34:45,822
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:34:45,822
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153

Node 152:
 INFO [HintedHandoff:1] 2013-01-30 21:02:03,706
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:02:03,707
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 WARN [MemoryMeter:1] 2013-01-30 21:09:57,220 Memtable.java (line 197)
setting live ratio to minimum of 1.0 instead of 0.15873949606408785
 INFO [MemoryMeter:1] 2013-01-30 21:09:57,221 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_profilePics') liveRatio is
1.0 (just-counted was 1.0).  calculation took 3ms for 301 columns
 INFO [MemoryMeter:1] 2013-01-30 21:09:57,223 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_pPicMD5_idx') liveRatio is
18.952013237727524 (just-counted was 18.952013237727524).  calculation
took 2ms for 74 columns
 INFO [MemoryMeter:1] 2013-01-30 21:09:57,242 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_userSBCode_idx') liveRatio
is 12.820966340897906 (just-counted was 12.209597352454495).
calculation took 19ms for 74 columns
 INFO [HintedHandoff:1] 2013-01-30 21:12:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:12:03,707
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [MemoryMeter:1] 2013-01-30 21:12:47,775 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_NewsFeedTimeline') liveRatio
is 3.605170679607766 (just-counted was 3.4883667978539283).
calculation took 110ms for 1462 columns
 INFO [HintedHandoff:1] 2013-01-30 21:22:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:22:03,708
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [MemoryMeter:1] 2013-01-30 21:24:14,795 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSMembers') liveRatio is
5.785411930027831 (just-counted was 5.622629145582293).  calculation
took 24ms for 463 columns
 WARN [ScheduledTasks:1] 2013-01-30 21:28:12,744 GCInspector.java
(line 145) Heap is 0.7609735541387436 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.  Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:28:12,744 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD', ColumnFamily='CF_users')
to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:28:12,745
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userEmail_idx@1131307878(146/182
serialized/live bytes, 4 ops)
 INFO [ScheduledTasks:1] 2013-01-30 21:28:12,755
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userOwnPhone_idx@973751772(146/182
serialized/live bytes, 4 ops)
 INFO [ScheduledTasks:1] 2013-01-30 21:28:12,755
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userOwnPhoneKey_idx@2020641864(146/182
serialized/live bytes, 4 ops)
 INFO [ScheduledTasks:1] 2013-01-30 21:28:12,756
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users.CF_users_userSBCode_idx@621414590(146/182
serialized/live bytes, 4 ops)
 INFO [ScheduledTasks:1] 2013-01-30 21:28:12,756
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_users@902038769(9358482/45802966 serialized/live bytes,
405140 ops)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,757 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userEmail_idx@1131307878(146/182
serialized/live bytes, 4 ops)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,768 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userEmail_idx-hf-2231-Data.db
(234 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,781 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userOwnPhone_idx@973751772(146/182
serialized/live bytes, 4 ops)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,793 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2217-Data.db
(265 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,802 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userOwnPhoneKey_idx@2020641864(146/182
serialized/live bytes, 4 ops)
 INFO [CompactionExecutor:42] 2013-01-30 21:28:12,813
CompactionTask.java (line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2217-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2215-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2216-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2214-Data.db')]
 INFO [FlushWriter:11] 2013-01-30 21:28:12,814 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2209-Data.db
(268 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,824 Memtable.java (line
264) Writing Memtable-CF_users.CF_users_userSBCode_idx@621414590(146/182
serialized/live bytes, 4 ops)
 INFO [CompactionExecutor:43] 2013-01-30 21:28:12,826
CompactionTask.java (line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2206-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2208-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2209-Data.db'),
SSTableReader(path='/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2207-Data.db')]
 INFO [FlushWriter:11] 2013-01-30 21:28:12,835 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userSBCode_idx-hf-2205-Data.db
(263 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
 INFO [FlushWriter:11] 2013-01-30 21:28:12,838 Memtable.java (line
264) Writing Memtable-CF_users@902038769(9358482/45802966
serialized/live bytes, 405140 ops)
 INFO [FlushWriter:11] 2013-01-30 21:28:13,513 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users-hf-3989-Data.db
(3358487 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=6624766)
 INFO [CompactionExecutor:42] 2013-01-30 21:28:28,209
CompactionTask.java (line 221) Compacted to
[/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhone_idx-hf-2218-Data.db,].
 12,758,536 to 12,758,266 (~99% of original) bytes for 298,733 keys at
0.790337MB/s.  Time: 15,395ms.
 INFO [CompactionExecutor:43] 2013-01-30 21:28:28,258
CompactionTask.java (line 221) Compacted to
[/raid0/cassandra/data/SB_ACD/CF_users/SB_ACD-CF_users.CF_users_userOwnPhoneKey_idx-hf-2210-Data.db,].
 12,827,751 to 12,827,331 (~99% of original) bytes for 298,767 keys at
0.792812MB/s.  Time: 15,430ms.
 INFO [MemoryMeter:1] 2013-01-30 21:31:30,638 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_userPhoneAgenda') liveRatio
is 9.966352958143455 (just-counted was 6.201877882350339).
calculation took 24ms for 6574 columns
 INFO [HintedHandoff:1] 2013-01-30 21:32:03,707
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:32:03,708
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 WARN [ScheduledTasks:1] 2013-01-30 21:33:41,079 GCInspector.java
(line 145) Heap is 0.7632044026868381 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.  Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:33:41,079 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:33:41,080
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_R2Notifications@1187465569(760532/33771575 serialized/live
bytes, 93136 ops)
 INFO [FlushWriter:12] 2013-01-30 21:33:41,105 Memtable.java (line
264) Writing Memtable-CF_R2Notifications@1187465569(760532/33771575
serialized/live bytes, 93136 ops)
 INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line
305) Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=11032499)

Node 153:
 INFO [HintedHandoff:1] 2013-01-30 21:01:45,310
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:01:45,311
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:05:45,297 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_users') liveRatio is
3.9183034832767603 (just-counted was 3.9183034832767603).  calculation
took 255ms for 58995 columns
 INFO [MemoryMeter:1] 2013-01-30 21:07:24,736 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_NewsFeedTimeline') liveRatio
is 3.6493489644964034 (just-counted was 3.471672980060775).
calculation took 107ms for 1395 columns
 INFO [ScheduledTasks:1] 2013-01-30 21:07:41,673 GCInspector.java
(line 122) GC for ParNew: 215 ms for 1 collections, 2310767440 used;
max is 3886022656
 INFO [HintedHandoff:1] 2013-01-30 21:11:45,316
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:11:45,318
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:14:40,052 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_pPicMD5_idx') liveRatio is
18.87210642710637 (just-counted was 18.854822335025382).  calculation
took 3ms for 80 columns
 INFO [MemoryMeter:1] 2013-01-30 21:14:40,054 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD',
ColumnFamily='CF_profilePics.CF_profilePics_userSBCode_idx') liveRatio
is 12.252182911928061 (just-counted was 11.273365071410675).
calculation took 2ms for 81 columns
 WARN [MemoryMeter:1] 2013-01-30 21:14:40,072 Memtable.java (line 197)
setting live ratio to minimum of 1.0 instead of 0.12200431775304404
 INFO [MemoryMeter:1] 2013-01-30 21:14:40,072 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_profilePics') liveRatio is
1.0 (just-counted was 1.0).  calculation took 3ms for 315 columns
 INFO [ScheduledTasks:1] 2013-01-30 21:14:48,898 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 2512549552 used;
max is 3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:15:32,061 Memtable.java (line 213)
CFS(Keyspace='SB_NNCD', ColumnFamily='CF_SBCodes') liveRatio is
32.58987080513604 (just-counted was 31.082180634662326).  calculation
took 2ms for 52 columns
 INFO [ScheduledTasks:1] 2013-01-30 21:21:08,322 GCInspector.java
(line 122) GC for ParNew: 239 ms for 1 collections, 2576069224 used;
max is 3886022656
 INFO [HintedHandoff:1] 2013-01-30 21:21:45,311
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:21:45,312
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:23:46,635 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSData') liveRatio is
7.342488214798115 (just-counted was 7.342488214798115).  calculation
took 0ms for 116 columns
 INFO [MemoryMeter:1] 2013-01-30 21:24:14,793 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_GSMembers') liveRatio is
5.753869723542124 (just-counted was 5.630372102807786).  calculation
took 24ms for 456 columns
 INFO [ScheduledTasks:1] 2013-01-30 21:30:18,511 GCInspector.java
(line 122) GC for ParNew: 226 ms for 1 collections, 2242332984 used;
max is 3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:30:46,426 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R1Notifications') liveRatio is
6.832365653565714 (just-counted was 6.675185638944877).  calculation
took 43ms for 860 columns
 INFO [HintedHandoff:1] 2013-01-30 21:31:45,311
HintedHandOffManager.java (line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:31:45,313
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.152
 INFO [ScheduledTasks:1] 2013-01-30 21:33:12,105 GCInspector.java
(line 122) GC for ParNew: 261 ms for 1 collections, 2603400728 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:33:21,312 GCInspector.java
(line 122) GC for ParNew: 248 ms for 1 collections, 3191815000 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:33:23,315 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 3471315456 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:33:35,323 GCInspector.java
(line 122) GC for ParNew: 204 ms for 1 collections, 1636739048 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:33:51,462 GCInspector.java
(line 122) GC for ParNew: 239 ms for 1 collections, 2789318016 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:00,805 GCInspector.java
(line 122) GC for ParNew: 217 ms for 1 collections, 2959924968 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:05,008 GCInspector.java
(line 122) GC for ParNew: 217 ms for 1 collections, 2900952792 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:17,262 GCInspector.java
(line 122) GC for ParNew: 231 ms for 1 collections, 2368279416 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:19,267 GCInspector.java
(line 122) GC for ParNew: 201 ms for 1 collections, 2297887816 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:28,326 GCInspector.java
(line 122) GC for ParNew: 215 ms for 1 collections, 2892129512 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:30,328 GCInspector.java
(line 122) GC for ParNew: 266 ms for 1 collections, 3168051328 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:42,552 GCInspector.java
(line 122) GC for ParNew: 249 ms for 1 collections, 2330298920 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:34:46,554 GCInspector.java
(line 122) GC for ParNew: 243 ms for 1 collections, 2496880592 used;
max is 3886022656
 WARN [ScheduledTasks:1] 2013-01-30 21:35:02,994 GCInspector.java
(line 145) Heap is 0.7641879728660028 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.  Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't
want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:35:02,995 StorageService.java
(line 2880) Flushing CFS(Keyspace='SB_ACD',
ColumnFamily='CF_R2Notifications') to relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:35:02,995
ColumnFamilyStore.java (line 659) Enqueuing flush of
Memtable-CF_R2Notifications@1630759928(1137418/88800160
serialized/live bytes, 171909 ops)
 INFO [FlushWriter:4] 2013-01-30 21:35:03,016 Memtable.java (line 264)
Writing Memtable-CF_R2Notifications@1630759928(1137418/88800160
serialized/live bytes, 171909 ops)
 INFO [ScheduledTasks:1] 2013-01-30 21:35:04,293 GCInspector.java
(line 122) GC for ParNew: 455 ms for 2 collections, 2926515240 used;
max is 3886022656
 INFO [FlushWriter:4] 2013-01-30 21:35:05,798 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-3832-Data.db
(790256 bytes) for commitlog position
ReplayPosition(segmentId=1359573697867, position=11748063)
 INFO [ScheduledTasks:1] 2013-01-30 21:35:06,344 GCInspector.java
(line 122) GC for ParNew: 204 ms for 1 collections, 3227837984 used;
max is 3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:35:06,369 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is
37.62080440650366 (just-counted was 12.784246424797216).  calculation
took 7181ms for 2683 columns
 INFO [ScheduledTasks:1] 2013-01-30 21:35:08,345 GCInspector.java
(line 122) GC for ParNew: 216 ms for 1 collections, 3586092576 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:15,625 GCInspector.java
(line 122) GC for ConcurrentMarkSweep: 1130 ms for 2 collections,
602554696 used; max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:22,887 GCInspector.java
(line 122) GC for ParNew: 253 ms for 1 collections, 1362776544 used;
max is 3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:23,888 GCInspector.java
(line 122) GC for ParNew: 313 ms for 1 collections, 1473067952 used;
max is 3886022656

Node 154:
 INFO [HintedHandoff:1] 2013-01-30 21:05:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:05:00,728
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:15:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:15:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [ScheduledTasks:1] 2013-01-30 21:21:28,248 GCInspector.java
(line 122) GC for ParNew: 356 ms for 1 collections, 1272387016 used;
max is 3886022656
 INFO [HintedHandoff:1] 2013-01-30 21:25:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:25:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:35:00,728
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:35:00,729
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153

Node 155:
 INFO [HintedHandoff:1] 2013-01-30 21:05:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:05:07,917
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:15:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:15:07,917
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:25:07,916
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:25:07,918
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:35:07,917
HintedHandOffManager.java (line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:35:07,918
HintedHandOffManager.java (line 392) Finished hinted handoff of 0 rows
to endpoint /10.0.0.153

I've read that the message 'Finished hinted handoff of 0 rows to
endpoint ...' is because a tombstone not being deleted, and that I
shouldn't worry about it.

I'm also talking with Amazon to check if there is a problem with their
service, the nodes 152 and 153 where using more cpu than the other
four nodes (but they have been like this all over the week, not just a
peak at the very same our for the last 3 days)

All the Cassandra configuration is at default values (the heap size,
row caches disabled, etc.)

Thanks for your support,


2013/1/31 Alain RODRIGUEZ <ar...@gmail.com>:
> @Bryan
>
> "Other space is used but can be more easily controlled by tuning for
> - memtable
> - compaction
> - key cache
> - row cache"
>
> Isn't row cache stored off-heap since a while ?
>
>
> 2013/1/31 Bryan Talbot <bt...@aeriagames.com>
>>
>> On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero
>> <gu...@spotbros.com> wrote:
>>>
>>>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
>>> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>>>
>>
>> This looks interesting.  Doesn't this mean that the ratio of space used
>> for that CF for memory to serialized form is 751:1 but was forced to a lower
>> "sane" value?
>>
>> -Bryan
>>
>



-- 

Guillermo Barbero - Backend Team

Spotbros Technologies

Re: too many warnings of Heap is full

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
@Bryan

"Other space is used but can be more easily controlled by tuning for
- memtable
- compaction
- key cache
- *row cache*"

Isn't row cache stored off-heap since a while ?

2013/1/31 Bryan Talbot <bt...@aeriagames.com>

> On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero <
> guillermo.barbero@spotbros.com> wrote:
>
>>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
>> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>>
>>
> This looks interesting.  Doesn't this mean that the ratio of space used
> for that CF for memory to serialized form is 751:1 but was forced to a
> lower "sane" value?
>
> -Bryan
>
>

Re: too many warnings of Heap is full

Posted by Bryan Talbot <bt...@aeriagames.com>.
On Wed, Jan 30, 2013 at 2:44 PM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:

>  WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
> setting live ratio to maximum of 64.0 instead of 751.6512549537648
>
>
This looks interesting.  Doesn't this mean that the ratio of space used for
that CF for memory to serialized form is 751:1 but was forced to a lower
"sane" value?

-Bryan

Re: too many warnings of Heap is full

Posted by Guillermo Barbero <gu...@spotbros.com>.
Guys,

First thanks a lot for your answers.. now I have a little more info, lets
see if this helps:

As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155)
(Amazon EC2 XL instances 17GB RAM) and the problem is that in the last
weeks we are seing that the performace of the cluster falls.. PHP commands
that usually lasted a few milliseconds last up to 15 seconds for just 2/3
minutes every now and then (no specific pattern) (see commands timing in
attached picture).

Today we have found this:

Prior to a fall in performance on all Cassandra nodes log I can see this:

*Node 150:*

INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1264710747(84317/476272
serialized/live bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live
bytes, 298 ops)
 INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db
(84559 bytes) for commitlog position
ReplayPosition(segmentId=1355151995930, position=32659991)
 INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')]

*Node 151:*
*
*
 INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,677 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,548 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1453632632(93461/191418
serialized/live bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,549 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1453632632(93461/191418 serialized/live
bytes, 316 ops)
 INFO [FlushWriter:2636] 2013-01-30 21:35:41,563 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-266-Data.db
(93711 bytes) for commitlog position
ReplayPosition(segmentId=1354832000013, position=20033362)

*Node 153:*

* *INFO [GossipTasks:1] 2013-01-30 21:35:24,822 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [ScheduledTasks:1] 2013-01-30 21:35:24,896 GCInspector.java (line
122) GC for ParNew: 294 ms for 1 collections, 1539602448 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:26,897 GCInspector.java (line
122) GC for ParNew: 202 ms for 1 collections, 1829975768 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:27,898 GCInspector.java (line
122) GC for ParNew: 234 ms for 1 collections, 2037767104 used; max is
3886022656
 INFO [MemoryMeter:1] 2013-01-30 21:35:31,151 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.4195889775077013 (just-counted was 1.4195889775077013).  calculation took
109ms for 64 columns
 INFO [MemoryMeter:1] 2013-01-30 21:35:37,806 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.5276925191259334 (just-counted was 1.5276925191259334).  calculation took
11ms for 129 columns
 INFO [GossipStage:1] 2013-01-30 21:35:40,735 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,742 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [MemoryMeter:1] 2013-01-30 21:35:43,184 Memtable.java (line 213)
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.561312033901826 (just-counted was 1.561312033901826).  calculation took
234ms for 144 columns
 INFO [HintedHandoff:1] 2013-01-30 21:35:43,549 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1120216043(96278/187900
serialized/live bytes, 288 ops)

*Node 154:*
*
*
INFO [HintedHandoff:1] 2013-01-30 21:35:00,729 HintedHandOffManager.java
(line 392) Finished hinted handoff of 0 rows to endpoint /10.0.0.153
 INFO [GossipTasks:1] 2013-01-30 21:35:26,432 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,669 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,208 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@908822645(92404/185145
serialized/live bytes, 286 ops)
 INFO [FlushWriter:2496] 2013-01-30 21:35:41,209 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@908822645(92404/185145 serialized/live
bytes, 286 ops)


Node 155:

INFO [GossipTasks:1] 2013-01-30 21:35:27,117 Gossiper.java (line 831)
InetAddress /10.0.0.152 is now dead.
 INFO [GossipStage:1] 2013-01-30 21:35:40,668 Gossiper.java (line 817)
InetAddress /10.0.0.152 is now UP
 INFO [HintedHandoff:1] 2013-01-30 21:35:40,669 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
56713727820156407428984779325531226112 with IP: /10.0.0.152
 INFO [HintedHandoff:1] 2013-01-30 21:35:41,279 ColumnFamilyStore.java
(line 659) Enqueuing flush of
Memtable-HintsColumnFamily@1083199446(94578/190654
serialized/live bytes, 316 ops)

*Meanwhile on node 152:*

INFO [FlushWriter:12] 2013-01-30 21:33:42,802 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_ACD/CF_R2Notifications/SB_ACD-CF_R2Notifications-hf-4175-Data.db
(639626 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=11032499)
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,701 GCInspector.java (line
122) GC for ParNew: 35087 ms for 3 collections, 442940272 used; max is
3886022656
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,934 StatusLogger.java (line
57) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,941 StatusLogger.java (line
72) ReadStage                        32       343         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,942 StatusLogger.java (line
72) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,943 StatusLogger.java (line
72) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,944 StatusLogger.java (line
72) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,945 StatusLogger.java (line
72) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,946 StatusLogger.java (line
72) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
77) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
89) MessagingService                n/a      0,24
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,947 StatusLogger.java (line
99) Cache Type                     Size                 Capacity
    KeysToSave
Provider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
100) KeyCache                     133569                  2184533
           all
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
106) RowCache                          0                        0
           all
 org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
113) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,948 StatusLogger.java (line
116) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.Versions                         3,103
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,949 StatusLogger.java (line
116) system.schema_keyspaces                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:40,950 StatusLogger.java (line
116) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,059 StatusLogger.java (line
116) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.HintsColumnFamily               1,2369
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,060 StatusLogger.java (line
116) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_mailPipes            9241,30380203
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_profilePics            427,1173642
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBVars                         0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,080 StatusLogger.java (line
116) SB_NNCD.CF_SBCounters                   5,232
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,092 StatusLogger.java (line
116) SB_NNCD.CF_userPhoneAgenda        7919,2332898
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_BSLUsers                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_mailTimeline                   0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_SBMessages          54390,15054201
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,093 StatusLogger.java (line
116) SB_NNCD.CF_NewsFeedTimeline        1866,7681861
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_BSLSpots                     1,616
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBUniqueIds                    0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBChecks                       0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_NFTimelines             423,533156
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,094 StatusLogger.java (line
116) SB_NNCD.CF_SBAppUsersProfilePic                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBLikes                     16,565
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_SBCodes                  154,93735
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_attachmentsPerUser           80,107551
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,095 StatusLogger.java (line
116) SB_NNCD.CF_comments                  58,77581
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_attachments             641,247796
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_SBStats                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_NNCD.CF_userReversedPhoneAgenda        9047,7132129
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,096 StatusLogger.java (line
116) SB_ACD.CF_GSMembers               823,1206779
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,113 StatusLogger.java (line
116) SB_ACD.CF_NSLoad                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,114 StatusLogger.java (line
116) SB_ACD.CF_sessions                        0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,115 StatusLogger.java (line
116) SB_ACD.CF_userGS                   193,577229
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_userApps                      8,170
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,116 StatusLogger.java (line
116) SB_ACD.CF_GSData                     126,3503
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,117 StatusLogger.java (line
116) SB_ACD.CF_R2Notifications          1906,58481
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,118 StatusLogger.java (line
116) SB_ACD.CF_uniqueKeys                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,155 StatusLogger.java (line
116) SB_ACD.CF_userContactWannabes          953,438671
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,157 StatusLogger.java (line
116) SB_ACD.CF_SBApps                          0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,158 StatusLogger.java (line
116) SB_ACD.CF_SBAppUsers                      0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_R1Notifications        1354,2070513
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,159 StatusLogger.java (line
116) SB_ACD.CF_SBVotes                  463,230386
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,161 StatusLogger.java (line
116) SB_ACD.CF_userContacts           2968,4849519
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_sessionsSuspended                 0,0
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,168 StatusLogger.java (line
116) SB_ACD.CF_userBlockedList               5,207
 INFO [ScheduledTasks:1] 2013-01-30 21:35:41,172 StatusLogger.java (line
116) SB_ACD.CF_users                65386,10852776
WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 GCInspector.java (line 145)
Heap is 0.7514124302624755 full.  You may need to reduce memtable and/or
cache sizes.  Cassandra will now flush up to the two largest memtables to
free up memory.  Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2013-01-30 21:37:05,895 StorageService.java (line
2880) Flushing CFS(Keyspace='SB_NNCD', ColumnFamily='CF_mailPipes') to
relieve memory pressure
 INFO [ScheduledTasks:1] 2013-01-30 21:37:05,896 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-CF_mailPipes@192792072(6868296/30790355
serialized/live bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:05,909 Memtable.java (line 264)
Writing Memtable-CF_mailPipes@192792072(6868296/30790355 serialized/live
bytes, 9380 ops)
 INFO [FlushWriter:13] 2013-01-30 21:37:06,385 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/SB_NNCD/CF_mailPipes/SB_NNCD-CF_mailPipes-hf-2165-Data.db
(2049401 bytes) for commitlog position
ReplayPosition(segmentId=1359573115280, position=13578602)
 WARN [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 202)
setting live ratio to maximum of 64.0 instead of 751.6512549537648
 INFO [MemoryMeter:1] 2013-01-30 21:37:48,079 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_R2Notifications') liveRatio is 64.0
(just-counted was 64.0).  calculation took 61ms for 8 columns
 INFO [MemoryMeter:1] 2013-01-30 21:40:47,274 Memtable.java (line 213)
CFS(Keyspace='SB_ACD', ColumnFamily='CF_userGS') liveRatio is
5.309650177091864 (just-counted was 5.309650177091864).  calculation took
11ms for 170 columns
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,709 HintedHandOffManager.java
(line 296) Started hinted handoff for token:
85070591730234615865843651857942052864 with IP: /10.0.0.153
 INFO [HintedHandoff:1] 2013-01-30 21:42:03,712 ColumnFamilyStore.java
(line 659) Enqueuing flush of Memtable-HintsColumnFamily@1676047624(950/2369
serialized/live bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,713 Memtable.java (line 264)
Writing Memtable-HintsColumnFamily@1676047624(950/2369 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:14] 2013-01-30 21:42:03,730 Memtable.java (line 305)
Completed flushing
/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db
(1016 bytes) for commitlog position ReplayPosition(segmentId=1359573115280,
position=17323736)
 INFO [CompactionExecutor:48] 2013-01-30 21:42:03,731 CompactionTask.java
(line 109) Compacting
[SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-122-Data.db'),
SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-121-Data.db')]

The cluster is perfectly balanced:

Note: Ownership information does not include topology, please specify a
keyspace.
Address         DC          Rack        Status State   Load            Owns
               Token

               141784319550391032739561396922763706368
10.0.0.150      datacenter1 rack1       Up     Normal  13.25 GB
 16.67%              0
10.0.0.151      datacenter1 rack1       Up     Normal  10.31 GB
 16.67%              28356863910078203714492389662765613056
10.0.0.152      datacenter1 rack1       Up     Normal  8.58 GB
16.67%              56713727820156407428984779325531226112
10.0.0.153      datacenter1 rack1       Up     Normal  8.1 GB
 16.67%              85070591730234615865843651857942052864
10.0.0.154      datacenter1 rack1       Up     Normal  8.28 GB
16.67%              113427455640312814857969558651062452224
10.0.0.155      datacenter1 rack1       Up     Normal  9.38 GB
16.67%              141784319550391032739561396922763706368


I guess something happens on 152 that makes other nodes believe it is dead,
then does something to fix itself and comes back after (messing up
performance in the meanwhile) .Does this makes any sense? Any clue of what
might be going on? I can post more logs if you guys need

Thanks again!!

Guillermo



2013/1/30 Nate McCall <zz...@gmail.com>

> What's the output of nodetool cfstats for those 2 column families on
> cassNode2 and cassNode3? And what is the replication factor for this
> cluster?
>
> Per the previous reply, nodetool ring should show each of your nodes
> with ~16.7% of the data if well balanced.
>
> Also, the auto-detection for memory sizes in the startup script is a
> little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
> ram. It usually ends up allocating 4g/400m (max and young) whereas
> 8g/800m will give you some more breathing room.
>
> On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com>
> wrote:
> > My guess is that those one or two nodes with the gc pressure also have
> more
> > rows in your big CF.  More rows could be due to imbalanced distribution
> if
> > your'e not using a random partitioner or from those nodes not yet
> removing
> > deleted rows which other nodes may have done.
> >
> > JVM heap space is used for a few things which scale with key count
> > including:
> > - bloom filter (for C* < 1.2)
> > - index samples
> >
> > Other space is used but can be more easily controlled by tuning for
> > - memtable
> > - compaction
> > - key cache
> > - row cache
> >
> >
> > So, if those nodes have more rows (check using "nodetool ring" or
> "nodetool
> > cfstats") than the others you can try to:
> > - reduce the number of rows by adding nodes, run manual / tune
> compactions
> > to remove rows with expired tombstones, etc.
> > - increase bloom filter fp chance
> > - increase jvm heap size (don't go too big)
> > - disable key or row cache
> > - increase index sample interval
> >
> > Not all of those things are generally good especially to the extreme so
> > don't go setting a 20 GB jvm heap without understanding the consequences
> for
> > example.
> >
> > -Bryan
> >
> >
> > On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> > <gu...@spotbros.com> wrote:
> >>
> >> Hi,
> >>
> >>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
> >> warning messages are due to Heap is % full. According to this link
> >>
> >> (
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
> )
> >> there are two ways to "reduce pressure":
> >> 1. Decrease the cache sizes
> >> 2. Increase the index interval size
> >>
> >> Most of the flushes are in two column families (users and messages), I
> >> guess that's because the most mutations are there.
> >>
> >> I still have not applied those changes to the production environment.
> >> Do you recommend any other meassure? Should I set specific tunning for
> >> these two CFs? Should I check another metric?
> >>
> >> Additionally, the distribution of warning messages is not uniform
> >> along the cluster. Why could cassandra be doing this? What should I do
> >> to find out how to fix this?
> >>
> >> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
> >> the java version is the following:
> >> java version "1.6.0_37"
> >> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> >> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
> >>
> >> The cassandra system.log is resumed here (numer of messages, cassandra
> >> node, class that reports the message, first word of the message)
> >> 2013-01-26
> >>       5 cassNode0: GCInspector.java Heap
> >>       5 cassNode0: StorageService.java Flushing
> >>     232 cassNode2: GCInspector.java Heap
> >>     232 cassNode2: StorageService.java Flushing
> >>     104 cassNode3: GCInspector.java Heap
> >>     104 cassNode3: StorageService.java Flushing
> >>       3 cassNode4: GCInspector.java Heap
> >>       3 cassNode4: StorageService.java Flushing
> >>       3 cassNode5: GCInspector.java Heap
> >>       3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-27
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       3 cassNode1: GCInspector.java Heap
> >>       3 cassNode1: StorageService.java Flushing
> >>     189 cassNode2: GCInspector.java Heap
> >>     189 cassNode2: StorageService.java Flushing
> >>     104 cassNode3: GCInspector.java Heap
> >>     104 cassNode3: StorageService.java Flushing
> >>       1 cassNode4: GCInspector.java Heap
> >>       1 cassNode4: StorageService.java Flushing
> >>       1 cassNode5: GCInspector.java Heap
> >>       1 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-28
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       1 cassNode1: GCInspector.java Heap
> >>       1 cassNode1: StorageService.java Flushing
> >>       1 cassNode2: AutoSavingCache.java Reducing
> >>     343 cassNode2: GCInspector.java Heap
> >>     342 cassNode2: StorageService.java Flushing
> >>     181 cassNode3: GCInspector.java Heap
> >>     181 cassNode3: StorageService.java Flushing
> >>       4 cassNode4: GCInspector.java Heap
> >>       4 cassNode4: StorageService.java Flushing
> >>       3 cassNode5: GCInspector.java Heap
> >>       3 cassNode5: StorageService.java Flushing
> >>
> >> 2013-01-29
> >>       2 cassNode0: GCInspector.java Heap
> >>       2 cassNode0: StorageService.java Flushing
> >>       3 cassNode1: GCInspector.java Heap
> >>       3 cassNode1: StorageService.java Flushing
> >>     156 cassNode2: GCInspector.java Heap
> >>     156 cassNode2: StorageService.java Flushing
> >>      71 cassNode3: GCInspector.java Heap
> >>      71 cassNode3: StorageService.java Flushing
> >>       2 cassNode4: GCInspector.java Heap
> >>       2 cassNode4: StorageService.java Flushing
> >>       2 cassNode5: GCInspector.java Heap
> >>       1 cassNode5: Memtable.java setting
> >>       2 cassNode5: StorageService.java Flushing
> >>
> >> --
> >>
> >> Guillermo Barbero - Backend Team
> >>
> >> Spotbros Technologies
> >
> >
> >
>



-- 
<http://spotbros.com>
Guillermo Barbero - Backend Team

Spotbros Technologies
<http://www.facebook.com/spotbros>  <http://twitter.com/spotbros>

Re: too many warnings of Heap is full

Posted by Nate McCall <zz...@gmail.com>.
What's the output of nodetool cfstats for those 2 column families on
cassNode2 and cassNode3? And what is the replication factor for this
cluster?

Per the previous reply, nodetool ring should show each of your nodes
with ~16.7% of the data if well balanced.

Also, the auto-detection for memory sizes in the startup script is a
little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of
ram. It usually ends up allocating 4g/400m (max and young) whereas
8g/800m will give you some more breathing room.

On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot <bt...@aeriagames.com> wrote:
> My guess is that those one or two nodes with the gc pressure also have more
> rows in your big CF.  More rows could be due to imbalanced distribution if
> your'e not using a random partitioner or from those nodes not yet removing
> deleted rows which other nodes may have done.
>
> JVM heap space is used for a few things which scale with key count
> including:
> - bloom filter (for C* < 1.2)
> - index samples
>
> Other space is used but can be more easily controlled by tuning for
> - memtable
> - compaction
> - key cache
> - row cache
>
>
> So, if those nodes have more rows (check using "nodetool ring" or "nodetool
> cfstats") than the others you can try to:
> - reduce the number of rows by adding nodes, run manual / tune compactions
> to remove rows with expired tombstones, etc.
> - increase bloom filter fp chance
> - increase jvm heap size (don't go too big)
> - disable key or row cache
> - increase index sample interval
>
> Not all of those things are generally good especially to the extreme so
> don't go setting a 20 GB jvm heap without understanding the consequences for
> example.
>
> -Bryan
>
>
> On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero
> <gu...@spotbros.com> wrote:
>>
>> Hi,
>>
>>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
>> warning messages are due to Heap is % full. According to this link
>>
>> (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html)
>> there are two ways to "reduce pressure":
>> 1. Decrease the cache sizes
>> 2. Increase the index interval size
>>
>> Most of the flushes are in two column families (users and messages), I
>> guess that's because the most mutations are there.
>>
>> I still have not applied those changes to the production environment.
>> Do you recommend any other meassure? Should I set specific tunning for
>> these two CFs? Should I check another metric?
>>
>> Additionally, the distribution of warning messages is not uniform
>> along the cluster. Why could cassandra be doing this? What should I do
>> to find out how to fix this?
>>
>> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
>> the java version is the following:
>> java version "1.6.0_37"
>> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>>
>> The cassandra system.log is resumed here (numer of messages, cassandra
>> node, class that reports the message, first word of the message)
>> 2013-01-26
>>       5 cassNode0: GCInspector.java Heap
>>       5 cassNode0: StorageService.java Flushing
>>     232 cassNode2: GCInspector.java Heap
>>     232 cassNode2: StorageService.java Flushing
>>     104 cassNode3: GCInspector.java Heap
>>     104 cassNode3: StorageService.java Flushing
>>       3 cassNode4: GCInspector.java Heap
>>       3 cassNode4: StorageService.java Flushing
>>       3 cassNode5: GCInspector.java Heap
>>       3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-27
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       3 cassNode1: GCInspector.java Heap
>>       3 cassNode1: StorageService.java Flushing
>>     189 cassNode2: GCInspector.java Heap
>>     189 cassNode2: StorageService.java Flushing
>>     104 cassNode3: GCInspector.java Heap
>>     104 cassNode3: StorageService.java Flushing
>>       1 cassNode4: GCInspector.java Heap
>>       1 cassNode4: StorageService.java Flushing
>>       1 cassNode5: GCInspector.java Heap
>>       1 cassNode5: StorageService.java Flushing
>>
>> 2013-01-28
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       1 cassNode1: GCInspector.java Heap
>>       1 cassNode1: StorageService.java Flushing
>>       1 cassNode2: AutoSavingCache.java Reducing
>>     343 cassNode2: GCInspector.java Heap
>>     342 cassNode2: StorageService.java Flushing
>>     181 cassNode3: GCInspector.java Heap
>>     181 cassNode3: StorageService.java Flushing
>>       4 cassNode4: GCInspector.java Heap
>>       4 cassNode4: StorageService.java Flushing
>>       3 cassNode5: GCInspector.java Heap
>>       3 cassNode5: StorageService.java Flushing
>>
>> 2013-01-29
>>       2 cassNode0: GCInspector.java Heap
>>       2 cassNode0: StorageService.java Flushing
>>       3 cassNode1: GCInspector.java Heap
>>       3 cassNode1: StorageService.java Flushing
>>     156 cassNode2: GCInspector.java Heap
>>     156 cassNode2: StorageService.java Flushing
>>      71 cassNode3: GCInspector.java Heap
>>      71 cassNode3: StorageService.java Flushing
>>       2 cassNode4: GCInspector.java Heap
>>       2 cassNode4: StorageService.java Flushing
>>       2 cassNode5: GCInspector.java Heap
>>       1 cassNode5: Memtable.java setting
>>       2 cassNode5: StorageService.java Flushing
>>
>> --
>>
>> Guillermo Barbero - Backend Team
>>
>> Spotbros Technologies
>
>
>

Re: too many warnings of Heap is full

Posted by Bryan Talbot <bt...@aeriagames.com>.
My guess is that those one or two nodes with the gc pressure also have more
rows in your big CF.  More rows could be due to imbalanced distribution if
your'e not using a random partitioner or from those nodes not yet removing
deleted rows which other nodes may have done.

JVM heap space is used for a few things which scale with key count
including:
- bloom filter (for C* < 1.2)
- index samples

Other space is used but can be more easily controlled by tuning for
- memtable
- compaction
- key cache
- row cache


So, if those nodes have more rows (check using "nodetool ring" or "nodetool
cfstats") than the others you can try to:
- reduce the number of rows by adding nodes, run manual / tune compactions
to remove rows with expired tombstones, etc.
- increase bloom filter fp chance
- increase jvm heap size (don't go too big)
- disable key or row cache
- increase index sample interval

Not all of those things are generally good especially to the extreme so
don't go setting a 20 GB jvm heap without understanding
the consequences for example.

-Bryan


On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero <
guillermo.barbero@spotbros.com> wrote:

> Hi,
>
>   I'm viewing a weird behaviour in my cassandra cluster. Most of the
> warning messages are due to Heap is % full. According to this link
> (
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html
> )
> there are two ways to "reduce pressure":
> 1. Decrease the cache sizes
> 2. Increase the index interval size
>
> Most of the flushes are in two column families (users and messages), I
> guess that's because the most mutations are there.
>
> I still have not applied those changes to the production environment.
> Do you recommend any other meassure? Should I set specific tunning for
> these two CFs? Should I check another metric?
>
> Additionally, the distribution of warning messages is not uniform
> along the cluster. Why could cassandra be doing this? What should I do
> to find out how to fix this?
>
> cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2)
> the java version is the following:
> java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>
> The cassandra system.log is resumed here (numer of messages, cassandra
> node, class that reports the message, first word of the message)
> 2013-01-26
>       5 cassNode0: GCInspector.java Heap
>       5 cassNode0: StorageService.java Flushing
>     232 cassNode2: GCInspector.java Heap
>     232 cassNode2: StorageService.java Flushing
>     104 cassNode3: GCInspector.java Heap
>     104 cassNode3: StorageService.java Flushing
>       3 cassNode4: GCInspector.java Heap
>       3 cassNode4: StorageService.java Flushing
>       3 cassNode5: GCInspector.java Heap
>       3 cassNode5: StorageService.java Flushing
>
> 2013-01-27
>       2 cassNode0: GCInspector.java Heap
>       2 cassNode0: StorageService.java Flushing
>       3 cassNode1: GCInspector.java Heap
>       3 cassNode1: StorageService.java Flushing
>     189 cassNode2: GCInspector.java Heap
>     189 cassNode2: StorageService.java Flushing
>     104 cassNode3: GCInspector.java Heap
>     104 cassNode3: StorageService.java Flushing
>       1 cassNode4: GCInspector.java Heap
>       1 cassNode4: StorageService.java Flushing
>       1 cassNode5: GCInspector.java Heap
>       1 cassNode5: StorageService.java Flushing
>
> 2013-01-28
>       2 cassNode0: GCInspector.java Heap
>       2 cassNode0: StorageService.java Flushing
>       1 cassNode1: GCInspector.java Heap
>       1 cassNode1: StorageService.java Flushing
>       1 cassNode2: AutoSavingCache.java Reducing
>     343 cassNode2: GCInspector.java Heap
>     342 cassNode2: StorageService.java Flushing
>     181 cassNode3: GCInspector.java Heap
>     181 cassNode3: StorageService.java Flushing
>       4 cassNode4: GCInspector.java Heap
>       4 cassNode4: StorageService.java Flushing
>       3 cassNode5: GCInspector.java Heap
>       3 cassNode5: StorageService.java Flushing
>
> 2013-01-29
>       2 cassNode0: GCInspector.java Heap
>       2 cassNode0: StorageService.java Flushing
>       3 cassNode1: GCInspector.java Heap
>       3 cassNode1: StorageService.java Flushing
>     156 cassNode2: GCInspector.java Heap
>     156 cassNode2: StorageService.java Flushing
>      71 cassNode3: GCInspector.java Heap
>      71 cassNode3: StorageService.java Flushing
>       2 cassNode4: GCInspector.java Heap
>       2 cassNode4: StorageService.java Flushing
>       2 cassNode5: GCInspector.java Heap
>       1 cassNode5: Memtable.java setting
>       2 cassNode5: StorageService.java Flushing
>
> --
>
> Guillermo Barbero - Backend Team
>
> Spotbros Technologies
>