You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Raoufeh Hashemian (Jira)" <ji...@apache.org> on 2022/07/08 21:29:00 UTC

[jira] [Updated] (CASSANDRA-17743) DateTiered Compaction starts at 0 UTC and CPU stays at 100% for hours

     [ https://issues.apache.org/jira/browse/CASSANDRA-17743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raoufeh Hashemian updated CASSANDRA-17743:
------------------------------------------
    Attachment: Cassandra_htop.png

> DateTiered Compaction starts at 0 UTC and CPU stays at 100% for hours
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-17743
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17743
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Raoufeh Hashemian
>            Priority: Normal
>         Attachments: Cassandra_htop.png
>
>
>  Every few days one or two random nodes in the cluster suddenly has its CPU pinned to 100% utilization with CPU load staying at around 70. 
> We see log lines like this repeating every 1 second:
> {code:java}
> Spinning trying to capture readers [BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-45664-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-51823-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-60406-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63287-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-14971-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63722-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63370-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63718-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63712-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-6629-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-56204-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63341-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-62260-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63695-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-11863-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63373-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-2306-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-39543-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-38025-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63719-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63365-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63560-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-17819-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-36341-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-8587-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63706-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63311-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63085-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63724-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63372-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-33438-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-48749-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-10328-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-8104-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63720-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-21002-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-54608-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-7574-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-4494-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63697-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-42502-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-3365-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-30295-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-5603-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-62647-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63371-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-27203-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-9257-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-57715-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63182-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-19467-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63721-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-24053-big-Data.db')], released: [BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-2306-big-Data.db')], {code}
> The problem strarted happening when the first data reached its TTL (30 days)
> Looking at the logs, we noticed that the high load period starts with a compaction on one of our tables with DateTiered Compaction strategy. The compaction window is default 1 day.
> Once the compaction finishes ( after 6+ hours) the log lines stop appearing.
> Running htop during the incident shows lots of threads called "RMI TCP connect" are using CPU. 
> !https://confluence-eng-rtp2.cisco.com/conf/download/attachments/420544492/Screen%20Shot%202022-06-29%20at%206.36.28%20PM.png?version=1&modificationDate=1656553092761&api=v2!
> Cassandra version: 4.0.1
> Node type: r5.2xlarge
> We don't run repair on the cluster.
> We found this https://issues.apache.org/jira/browse/CASSANDRA-10829 and https://issues.apache.org/jira/browse/CASSANDRA-11155 as existing bug fixes for the same log line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org