You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Olivier Rosello <or...@corp.free.fr> on 2010/07/07 12:05:45 UTC
High CPU usage on all nodes without any read or write
Hi,
We are testing Cassandra here, we would like to use it to store some data :
- about 1000 inserts / seconds in a CF "RAW" :
Column : TimeUUID (timeuuid of the insert, so 1000 new columns / second)
Row : YYYYMMDDHH of the insert (to minimize the size of rows, the biggest one is 2GB data), for example for today at 1:30pm, the row of an insert will be 2010070713
- 4 CF we are using for indexes, all have the same structure
Column : TimeUUID of the insert (about 300 new columns per second, each correspond to the same timeuuid of the CF "RAW".
Row : a String (Customer Id).
The goal is for a Customer, to be able to get all its "RAW" for a time range.
Currently Cassandra is running on 4 nodes / 16 GB RAM each, 8 GB JVM_MAX_MEM
It runs correctly during several days. Last night, we started to have timeout exception on insert and high cpu load on all nodes.
We stopped inserts. But the CPU remains high (without any insert or read).
We tried to stop and start nodes again, cpu remains high (between 100% and 150%).
So currently, we can't add more data (timeout exceptions, due to high cpu usage I guess).
Could you help me to understand what's wrong ? :)
Cheers,
Olivier
Here are some data :
We are using Cassandra 0.6.2
Address Status Load Range Ring
170141183460469231731687303715884105728
192.168.0.1 Up 23.43 GB 42535295865117307932921825928971026432 |<--|
192.168.0.2 Up 23.12 GB 85070591730234615865843651857942052864 | |
192.168.0.3 Up 24.47 GB 127605887595351923798765477786913079296 | |
192.168.0.4 Up 24.14 GB 170141183460469231731687303715884105728 |-->|
Keyspace: system
Read Count: 8
Read Latency: 60.38625 ms.
Write Count: 13
Write Latency: 0.3583076923076923 ms.
Pending Tasks: 0
Column Family: HintsColumnFamily
SSTable count: 1
Space used (live): 103812
Space used (total): 103812
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 5
Read Latency: 83,440 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1
Key cache size: 1
Key cache hit rate: 0.6666666666666666
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
Column Family: LocationInfo
SSTable count: 3
Space used (live): 3999
Space used (total): 3999
Memtable Columns Count: 7
Memtable Data Size: 227
Memtable Switch Count: 2
Read Count: 3
Read Latency: 21,964 ms.
Write Count: 13
Write Latency: 0,358 ms.
Pending Tasks: 0
Key cache capacity: 3
Key cache size: 3
Key cache hit rate: 0.2
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
----------------
Keyspace: MyKeySpace
Read Count: 5
Read Latency: 22737.799 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Column Family: INDEX1
SSTable count: 5
Space used (live): 1502023165
Space used (total): 1502023165
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 1
Read Latency: 0,876 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
Column Family: INDEX2
SSTable count: 3
Space used (live): 1262977087
Space used (total): 1262977087
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 1
Read Latency: 0,547 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
Column Family: RAW
SSTable count: 6
Space used (live): 18867913004
Space used (total): 18867913004
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 1
Read Latency: 113685,183 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 3
Key cache hit rate: 0.0
Row cache: disabled
Compacted row minimum size: 215734
Compacted row maximum size: 1055255881
Compacted row mean size: 491170239
Column Family: INDEX3
SSTable count: 6
Space used (live): 1667728231
Space used (total): 1667728231
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 1
Read Latency: 1,241 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
Column Family: INDEX4
SSTable count: 7
Space used (live): 2620328669
Space used (total): 2620328669
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 1
Read Latency: 1,148 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0
Re: High CPU usage on all nodes without any read or write
Posted by yoshiyuki kanno <ne...@gmail.com>.
Hi
If the reason for this happening is compaction,
changing the priority of the compaction thread might be effective.
(cassandra0.6.3 offers this function)
see also 0.6.3 changelog
2010/7/7 Olivier Rosello <or...@corp.free.fr>
> Hi,
>
> We are testing Cassandra here, we would like to use it to store some data :
> - about 1000 inserts / seconds in a CF "RAW" :
> Column : TimeUUID (timeuuid of the insert, so 1000 new columns / second)
> Row : YYYYMMDDHH of the insert (to minimize the size of rows, the biggest
> one is 2GB data), for example for today at 1:30pm, the row of an insert will
> be 2010070713
> - 4 CF we are using for indexes, all have the same structure
> Column : TimeUUID of the insert (about 300 new columns per second, each
> correspond to the same timeuuid of the CF "RAW".
> Row : a String (Customer Id).
>
> The goal is for a Customer, to be able to get all its "RAW" for a time
> range.
>
> Currently Cassandra is running on 4 nodes / 16 GB RAM each, 8 GB
> JVM_MAX_MEM
>
> It runs correctly during several days. Last night, we started to have
> timeout exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).
>
> We tried to stop and start nodes again, cpu remains high (between 100% and
> 150%).
>
>
> So currently, we can't add more data (timeout exceptions, due to high cpu
> usage I guess).
>
> Could you help me to understand what's wrong ? :)
>
> Cheers,
>
> Olivier
>
>
> Here are some data :
>
> We are using Cassandra 0.6.2
>
> Address Status Load Range
> Ring
>
> 170141183460469231731687303715884105728
> 192.168.0.1 Up 23.43 GB
> 42535295865117307932921825928971026432 |<--|
> 192.168.0.2 Up 23.12 GB
> 85070591730234615865843651857942052864 | |
> 192.168.0.3 Up 24.47 GB
> 127605887595351923798765477786913079296 | |
> 192.168.0.4 Up 24.14 GB
> 170141183460469231731687303715884105728 |-->|
>
>
>
> Keyspace: system
> Read Count: 8
> Read Latency: 60.38625 ms.
> Write Count: 13
> Write Latency: 0.3583076923076923 ms.
> Pending Tasks: 0
> Column Family: HintsColumnFamily
> SSTable count: 1
> Space used (live): 103812
> Space used (total): 103812
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 5
> Read Latency: 83,440 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 1
> Key cache size: 1
> Key cache hit rate: 0.6666666666666666
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> Column Family: LocationInfo
> SSTable count: 3
> Space used (live): 3999
> Space used (total): 3999
> Memtable Columns Count: 7
> Memtable Data Size: 227
> Memtable Switch Count: 2
> Read Count: 3
> Read Latency: 21,964 ms.
> Write Count: 13
> Write Latency: 0,358 ms.
> Pending Tasks: 0
> Key cache capacity: 3
> Key cache size: 3
> Key cache hit rate: 0.2
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> ----------------
> Keyspace: MyKeySpace
> Read Count: 5
> Read Latency: 22737.799 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Column Family: INDEX1
> SSTable count: 5
> Space used (live): 1502023165
> Space used (total): 1502023165
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 1
> Read Latency: 0,876 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 200000
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> Column Family: INDEX2
> SSTable count: 3
> Space used (live): 1262977087
> Space used (total): 1262977087
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 1
> Read Latency: 0,547 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 200000
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> Column Family: RAW
> SSTable count: 6
> Space used (live): 18867913004
> Space used (total): 18867913004
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 1
> Read Latency: 113685,183 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 200000
> Key cache size: 3
> Key cache hit rate: 0.0
> Row cache: disabled
> Compacted row minimum size: 215734
> Compacted row maximum size: 1055255881
> Compacted row mean size: 491170239
>
> Column Family: INDEX3
> SSTable count: 6
> Space used (live): 1667728231
> Space used (total): 1667728231
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 1
> Read Latency: 1,241 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 200000
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> Column Family: INDEX4
> SSTable count: 7
> Space used (live): 2620328669
> Space used (total): 2620328669
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 1
> Read Latency: 1,148 ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 200000
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
Re: High CPU usage on all nodes without any read or write
Posted by Peter Schuller <pe...@infidyne.com>.
> It runs correctly during several days. Last night, we started to have timeout exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).
Has data been written to the cluster faster than background compaction
is proceeding? If so you may see cassandra eating CPU (and doing I/O)
in the background for extended periods of time even after you stop
sending requests to it.
If this is what is happening it should be visible in the log that it's
doing compaction, and you should see that the data directories contain
lots of files (unless it's just now catching up) rather than the
fairly few expectation when compaction is up to speed.
Also consider that even if you're not writing faster than it can
handle, if you have lots of data in total, the bigger compactions will
take a considerable mount of time so you may see CPU+disk activity for
long periods even if all is otherwise well.
Of course you say your're seeing timeouts. Is is possible these are
timeouts that happen during compaction in general? What kind of
latency are we talking about (a few extra hundre millis or several
seconds?) and is there a correlation between the timeouts and lots of
data being flushed to disk (iostat -x -k 1)?
--
/ Peter Schuller