You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Jira)" <ji...@apache.org> on 2022/07/11 10:24:00 UTC

[jira] [Commented] (CASSANDRA-17743) DateTiered Compaction starts at 0 UTC and CPU stays at 100% for hours

    [ https://issues.apache.org/jira/browse/CASSANDRA-17743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564912#comment-17564912 ] 

Brandon Williams commented on CASSANDRA-17743:
----------------------------------------------

bq. Running htop during the incident shows lots of threads called "RMI TCP connect" are using CPU. 

These are JMX connections.  Do you know what they are from and doing?  It's possible that they were doing something making the readers unable to be captured for compaction.

> DateTiered Compaction starts at 0 UTC and CPU stays at 100% for hours
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-17743
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17743
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Raoufeh Hashemian
>            Priority: Normal
>         Attachments: Cassandra_htop.png
>
>
>  Every few days one or two random nodes in the cluster suddenly has its CPU pinned to 100% utilization with CPU load staying at around 70 exactly at 0 UTC
> We see log lines like this repeating every 1 second:
> {code:java}
> Spinning trying to capture readers [BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-45664-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-51823-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-60406-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63287-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-14971-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63722-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63370-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63718-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63712-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-6629-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-56204-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63341-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-62260-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63695-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-11863-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63373-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-2306-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-39543-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-38025-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63719-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63365-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63560-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-17819-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-36341-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-8587-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63706-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63311-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63085-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63724-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63372-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-33438-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-48749-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-10328-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-8104-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63720-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-21002-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-54608-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-7574-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-4494-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63697-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-42502-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-3365-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-30295-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-5603-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-62647-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63371-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-27203-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-9257-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-57715-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63182-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-19467-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-63721-big-Data.db'), BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-24053-big-Data.db')], released: [BigTableReader(path='/data/cassandra/data/my_keyspace/my_table-1236bbe0d78411ec917781a26cd8c223/nb-2306-big-Data.db')], {code}
> The problem strarted happening when the first data reached its TTL (30 days)
> Looking at the logs, we noticed that the high load period starts with a compaction on one of our tables with DateTiered Compaction strategy. The compaction window is default 1 day.
> Once the compaction finishes ( after 6+ hours) the log lines stop appearing.
> Running htop during the incident shows lots of threads called "RMI TCP connect" are using CPU.  !Cassandra_htop.png!
> Cassandra version: 4.0.1
> Node type: r5.2xlarge
> We don't run repair on the cluster.
> We found this https://issues.apache.org/jira/browse/CASSANDRA-10829 and https://issues.apache.org/jira/browse/CASSANDRA-11155 as existing bug fixes for the same log line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org