You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Krish Donald <go...@gmail.com> on 2020/08/20 15:33:11 UTC

How to know if we need to increase heap size?

Hi,

We have a cluster where if reads are increased 2-3 times suddenly then
cassandra cpu goes around 100% (We have 48 cpu machines with 128GB RAM) for
few nodes and cassandra becomes unresponsive .
We are on 3.11.5 and using G1GC with 16GB heap size.
When going through the system.logs and gc.log , i see in system.log it is
just printing messages like below every 5 secs. I have removed lines for
many keyspaces to reduce the size of the text. , and lot of messages are
getting printed in gc.log . I feel that may be i need to increase heap size
on these nodes but i wanted to understand , how do we determine if heap
size should be increased or not. Nodes are not dying due to OOMs . When we
have OOMs , we know for sure we need to increase heap size but *what to see
in gc.log , system.log and debug.log to determine if we have to increase
heap size.*

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,368 MessagingService.java:1246
- READ messages were dropped in last 5000 ms: 199 internal and 232 cross
node. Mean internal dropped latency: 10443 ms and Mean cross-node dropped
latency: 10402 ms
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,369 StatusLogger.java:47 -
Pool Name                    Active   Pending      Completed   Blocked  All
Time Blocked
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,377 StatusLogger.java:51 -
MutationStage                     0         0       80051890         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
ViewMutationStage                 0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
ReadStage                       192      1331      152624049         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
RequestResponseStage              0         0      172822890         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
ReadRepairStage                   0         0        1545869         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
CounterMutationStage              0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
MiscStage                         0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
CompactionExecutor                0         0         623536         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
MemtableReclaimMemory             0         0           6700         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
PendingRangeCalculator            0         0             18         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
GossipStage                       0         0        1613366         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
SecondaryIndexManagement          0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
HintsDispatcher                   0         0              5         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
MigrationStage                    0         0              1         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
MemtablePostFlush                 0         0          14830         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_0         0         0           6700         0
              0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
ValidationExecutor                0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
Sampler                           0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
MemtableFlushWriter               0         0           6700         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
InternalResponseStage             0         0          33229         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
AntiEntropyStage                  0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
CacheCleanupExecutor              0         0              0         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
Native-Transport-Requests       661         0       84577742         0
            0

INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:61 -
CompactionManager                 0         0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:73 -
MessagingService                n/a       0/0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:83 -
Cache Type                     Size                 Capacity
KeysToSave
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:85 -
KeyCache                  104857576                104857600
       all
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:91 -
RowCache                          0                        0
       all
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:98 -
Table                       Memtable ops,data
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
system_distributed.parent_repair_history                 0,0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
system_distributed.repair_history                 0,0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
system_distributed.view_build_status                 0,0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
system.compaction_history             12,3327
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
system.schema_aggregates                  0,0
INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
system.schema_triggers                    0,0

Thanks

Re: How to know if we need to increase heap size?

Posted by Elliott Sims <el...@backblaze.com>.
You want to look for full or long GCs in the logs, as well as how much
total time it's spending on GCing as a percentage.  Probably more the
latter, since you're not seeing long pauses with one core pegged and the
rest idle.  G1 handles oversized heaps well, so it's worth bumping to
20-27GB just to see what happens.

If it's not GC, then you're just running out of CPU and need more, or need
to figure out what queries are killing it.

On Thu, Aug 20, 2020 at 10:45 AM Lee Tewksbury <ex...@gmail.com> wrote:

> Depending on your thread count, you can consider increasing the max native
> transport threads and concurrent reads. But the keys to Cassandra are
> pretty make good data, make good queries, and if you can't keep up, double
> the cluster size. If you're following the documentation on heap size (1/2
> RAM or 20GB, whichever is lower) then I would suggest increasing threads
> but more importantly increasing node count.
>
> On Thu, Aug 20, 2020 at 10:33 AM Krish Donald <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We have a cluster where if reads are increased 2-3 times suddenly then
>> cassandra cpu goes around 100% (We have 48 cpu machines with 128GB RAM) for
>> few nodes and cassandra becomes unresponsive .
>> We are on 3.11.5 and using G1GC with 16GB heap size.
>> When going through the system.logs and gc.log , i see in system.log it is
>> just printing messages like below every 5 secs. I have removed lines for
>> many keyspaces to reduce the size of the text. , and lot of messages are
>> getting printed in gc.log . I feel that may be i need to increase heap size
>> on these nodes but i wanted to understand , how do we determine if heap
>> size should be increased or not. Nodes are not dying due to OOMs . When we
>> have OOMs , we know for sure we need to increase heap size but *what to
>> see in gc.log , system.log and debug.log to determine if we have to
>> increase heap size.*
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,368
>> MessagingService.java:1246 - READ messages were dropped in last 5000 ms:
>> 199 internal and 232 cross node. Mean internal dropped latency: 10443 ms
>> and Mean cross-node dropped latency: 10402 ms
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,369 StatusLogger.java:47 -
>> Pool Name                    Active   Pending      Completed   Blocked  All
>> Time Blocked
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,377 StatusLogger.java:51 -
>> MutationStage                     0         0       80051890         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ViewMutationStage                 0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ReadStage                       192      1331      152624049         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> RequestResponseStage              0         0      172822890         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
>> ReadRepairStage                   0         0        1545869         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> CounterMutationStage              0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> MiscStage                         0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> CompactionExecutor                0         0         623536         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
>> MemtableReclaimMemory             0         0           6700         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> PendingRangeCalculator            0         0             18         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> GossipStage                       0         0        1613366         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> SecondaryIndexManagement          0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
>> HintsDispatcher                   0         0              5         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> MigrationStage                    0         0              1         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> MemtablePostFlush                 0         0          14830         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0         0         0           6700         0
>>               0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
>> ValidationExecutor                0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> Sampler                           0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> MemtableFlushWriter               0         0           6700         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
>> InternalResponseStage             0         0          33229         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> AntiEntropyStage                  0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> CacheCleanupExecutor              0         0              0         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
>> Native-Transport-Requests       661         0       84577742         0
>>             0
>>
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:61 -
>> CompactionManager                 0         0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:73 -
>> MessagingService                n/a       0/0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:83 -
>> Cache Type                     Size                 Capacity
>> KeysToSave
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:85 -
>> KeyCache                  104857576                104857600
>>        all
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:91 -
>> RowCache                          0                        0
>>        all
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:98 -
>> Table                       Memtable ops,data
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
>> system_distributed.parent_repair_history                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
>> system_distributed.repair_history                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system_distributed.view_build_status                 0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.compaction_history             12,3327
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.schema_aggregates                  0,0
>> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
>> system.schema_triggers                    0,0
>>
>> Thanks
>>
>

Re: How to know if we need to increase heap size?

Posted by Lee Tewksbury <ex...@gmail.com>.
Depending on your thread count, you can consider increasing the max native
transport threads and concurrent reads. But the keys to Cassandra are
pretty make good data, make good queries, and if you can't keep up, double
the cluster size. If you're following the documentation on heap size (1/2
RAM or 20GB, whichever is lower) then I would suggest increasing threads
but more importantly increasing node count.

On Thu, Aug 20, 2020 at 10:33 AM Krish Donald <go...@gmail.com> wrote:

> Hi,
>
> We have a cluster where if reads are increased 2-3 times suddenly then
> cassandra cpu goes around 100% (We have 48 cpu machines with 128GB RAM) for
> few nodes and cassandra becomes unresponsive .
> We are on 3.11.5 and using G1GC with 16GB heap size.
> When going through the system.logs and gc.log , i see in system.log it is
> just printing messages like below every 5 secs. I have removed lines for
> many keyspaces to reduce the size of the text. , and lot of messages are
> getting printed in gc.log . I feel that may be i need to increase heap size
> on these nodes but i wanted to understand , how do we determine if heap
> size should be increased or not. Nodes are not dying due to OOMs . When we
> have OOMs , we know for sure we need to increase heap size but *what to
> see in gc.log , system.log and debug.log to determine if we have to
> increase heap size.*
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,368
> MessagingService.java:1246 - READ messages were dropped in last 5000 ms:
> 199 internal and 232 cross node. Mean internal dropped latency: 10443 ms
> and Mean cross-node dropped latency: 10402 ms
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,369 StatusLogger.java:47 -
> Pool Name                    Active   Pending      Completed   Blocked  All
> Time Blocked
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,377 StatusLogger.java:51 -
> MutationStage                     0         0       80051890         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
> ViewMutationStage                 0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
> ReadStage                       192      1331      152624049         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
> RequestResponseStage              0         0      172822890         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 -
> ReadRepairStage                   0         0        1545869         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
> CounterMutationStage              0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
> MiscStage                         0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
> CompactionExecutor                0         0         623536         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 -
> MemtableReclaimMemory             0         0           6700         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
> PendingRangeCalculator            0         0             18         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
> GossipStage                       0         0        1613366         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
> SecondaryIndexManagement          0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 -
> HintsDispatcher                   0         0              5         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
> MigrationStage                    0         0              1         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
> MemtablePostFlush                 0         0          14830         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_0         0         0           6700         0
>               0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 -
> ValidationExecutor                0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
> Sampler                           0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
> MemtableFlushWriter               0         0           6700         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 -
> InternalResponseStage             0         0          33229         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
> AntiEntropyStage                  0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
> CacheCleanupExecutor              0         0              0         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 -
> Native-Transport-Requests       661         0       84577742         0
>             0
>
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:61 -
> CompactionManager                 0         0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:73 -
> MessagingService                n/a       0/0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:83 -
> Cache Type                     Size                 Capacity
> KeysToSave
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:85 -
> KeyCache                  104857576                104857600
>        all
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:91 -
> RowCache                          0                        0
>        all
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:98 -
> Table                       Memtable ops,data
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
> system_distributed.parent_repair_history                 0,0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 -
> system_distributed.repair_history                 0,0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
> system_distributed.view_build_status                 0,0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
> system.compaction_history             12,3327
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
> system.schema_aggregates                  0,0
> INFO  [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 -
> system.schema_triggers                    0,0
>
> Thanks
>