You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Olivier Rosello <or...@corp.free.fr> on 2010/07/07 12:05:45 UTC

High CPU usage on all nodes without any read or write

Hi,

We are testing Cassandra here, we would like to use it to store some data :
- about 1000 inserts / seconds in a CF "RAW" :
   Column : TimeUUID (timeuuid of the insert, so 1000 new columns / second)
   Row : YYYYMMDDHH of the insert (to minimize the size of rows, the biggest one is 2GB data), for example for today at 1:30pm, the row of an insert will be 2010070713
- 4 CF we are using for indexes, all have the same structure
   Column : TimeUUID of the insert (about 300 new columns per second, each correspond to the same timeuuid of the CF "RAW".
   Row : a String (Customer Id).

The goal is for a Customer, to be able to get all its "RAW" for a time range.

Currently Cassandra is running on 4 nodes / 16 GB RAM each, 8 GB JVM_MAX_MEM

It runs correctly during several days. Last night, we started to have timeout exception on insert and high cpu load on all nodes.

We stopped inserts. But the CPU remains high (without any insert or read).

We tried to stop and start nodes again, cpu remains high (between 100% and 150%).


So currently, we can't add more data (timeout exceptions, due to high cpu usage I guess).

Could you help me to understand what's wrong ? :)

Cheers,

Olivier


Here are some data :

We are using Cassandra 0.6.2

Address       Status     Load          Range                                      Ring
                                       170141183460469231731687303715884105728    
192.168.0.1  Up         23.43 GB      42535295865117307932921825928971026432     |<--|
192.168.0.2  Up         23.12 GB      85070591730234615865843651857942052864     |   |
192.168.0.3  Up         24.47 GB      127605887595351923798765477786913079296    |   |
192.168.0.4  Up         24.14 GB      170141183460469231731687303715884105728    |-->|



Keyspace: system
	Read Count: 8
	Read Latency: 60.38625 ms.
	Write Count: 13
	Write Latency: 0.3583076923076923 ms.
	Pending Tasks: 0
		Column Family: HintsColumnFamily
		SSTable count: 1
		Space used (live): 103812
		Space used (total): 103812
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 5
		Read Latency: 83,440 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 1
		Key cache size: 1
		Key cache hit rate: 0.6666666666666666
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

		Column Family: LocationInfo
		SSTable count: 3
		Space used (live): 3999
		Space used (total): 3999
		Memtable Columns Count: 7
		Memtable Data Size: 227
		Memtable Switch Count: 2
		Read Count: 3
		Read Latency: 21,964 ms.
		Write Count: 13
		Write Latency: 0,358 ms.
		Pending Tasks: 0
		Key cache capacity: 3
		Key cache size: 3
		Key cache hit rate: 0.2
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

----------------
Keyspace: MyKeySpace
	Read Count: 5
	Read Latency: 22737.799 ms.
	Write Count: 0
	Write Latency: NaN ms.
	Pending Tasks: 0
		Column Family: INDEX1
		SSTable count: 5
		Space used (live): 1502023165
		Space used (total): 1502023165
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 1
		Read Latency: 0,876 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 0
		Key cache hit rate: NaN
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

		Column Family: INDEX2
		SSTable count: 3
		Space used (live): 1262977087
		Space used (total): 1262977087
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 1
		Read Latency: 0,547 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 0
		Key cache hit rate: NaN
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

		Column Family: RAW
		SSTable count: 6
		Space used (live): 18867913004
		Space used (total): 18867913004
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 1
		Read Latency: 113685,183 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 3
		Key cache hit rate: 0.0
		Row cache: disabled
		Compacted row minimum size: 215734
		Compacted row maximum size: 1055255881
		Compacted row mean size: 491170239

		Column Family: INDEX3
		SSTable count: 6
		Space used (live): 1667728231
		Space used (total): 1667728231
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 1
		Read Latency: 1,241 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 0
		Key cache hit rate: NaN
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

		Column Family: INDEX4
		SSTable count: 7
		Space used (live): 2620328669
		Space used (total): 2620328669
		Memtable Columns Count: 0
		Memtable Data Size: 0
		Memtable Switch Count: 0
		Read Count: 1
		Read Latency: 1,148 ms.
		Write Count: 0
		Write Latency: NaN ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 0
		Key cache hit rate: NaN
		Row cache: disabled
		Compacted row minimum size: 0
		Compacted row maximum size: 0
		Compacted row mean size: 0

Re: High CPU usage on all nodes without any read or write

Posted by yoshiyuki kanno <ne...@gmail.com>.
Hi

If the reason for this happening is compaction,
changing the priority of the compaction thread might be effective.
(cassandra0.6.3 offers this function)

see also 0.6.3 changelog

2010/7/7 Olivier Rosello <or...@corp.free.fr>

> Hi,
>
> We are testing Cassandra here, we would like to use it to store some data :
> - about 1000 inserts / seconds in a CF "RAW" :
>   Column : TimeUUID (timeuuid of the insert, so 1000 new columns / second)
>   Row : YYYYMMDDHH of the insert (to minimize the size of rows, the biggest
> one is 2GB data), for example for today at 1:30pm, the row of an insert will
> be 2010070713
> - 4 CF we are using for indexes, all have the same structure
>   Column : TimeUUID of the insert (about 300 new columns per second, each
> correspond to the same timeuuid of the CF "RAW".
>   Row : a String (Customer Id).
>
> The goal is for a Customer, to be able to get all its "RAW" for a time
> range.
>
> Currently Cassandra is running on 4 nodes / 16 GB RAM each, 8 GB
> JVM_MAX_MEM
>
> It runs correctly during several days. Last night, we started to have
> timeout exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).
>
> We tried to stop and start nodes again, cpu remains high (between 100% and
> 150%).
>
>
> So currently, we can't add more data (timeout exceptions, due to high cpu
> usage I guess).
>
> Could you help me to understand what's wrong ? :)
>
> Cheers,
>
> Olivier
>
>
> Here are some data :
>
> We are using Cassandra 0.6.2
>
> Address       Status     Load          Range
>        Ring
>
> 170141183460469231731687303715884105728
> 192.168.0.1  Up         23.43 GB
>  42535295865117307932921825928971026432     |<--|
> 192.168.0.2  Up         23.12 GB
>  85070591730234615865843651857942052864     |   |
> 192.168.0.3  Up         24.47 GB
>  127605887595351923798765477786913079296    |   |
> 192.168.0.4  Up         24.14 GB
>  170141183460469231731687303715884105728    |-->|
>
>
>
> Keyspace: system
>        Read Count: 8
>        Read Latency: 60.38625 ms.
>        Write Count: 13
>        Write Latency: 0.3583076923076923 ms.
>        Pending Tasks: 0
>                Column Family: HintsColumnFamily
>                SSTable count: 1
>                Space used (live): 103812
>                Space used (total): 103812
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 5
>                Read Latency: 83,440 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 1
>                Key cache size: 1
>                Key cache hit rate: 0.6666666666666666
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>
>                Column Family: LocationInfo
>                SSTable count: 3
>                Space used (live): 3999
>                Space used (total): 3999
>                Memtable Columns Count: 7
>                Memtable Data Size: 227
>                Memtable Switch Count: 2
>                Read Count: 3
>                Read Latency: 21,964 ms.
>                Write Count: 13
>                Write Latency: 0,358 ms.
>                Pending Tasks: 0
>                Key cache capacity: 3
>                Key cache size: 3
>                Key cache hit rate: 0.2
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>
> ----------------
> Keyspace: MyKeySpace
>        Read Count: 5
>        Read Latency: 22737.799 ms.
>        Write Count: 0
>        Write Latency: NaN ms.
>        Pending Tasks: 0
>                Column Family: INDEX1
>                SSTable count: 5
>                Space used (live): 1502023165
>                Space used (total): 1502023165
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 1
>                Read Latency: 0,876 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 200000
>                Key cache size: 0
>                Key cache hit rate: NaN
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>
>                Column Family: INDEX2
>                SSTable count: 3
>                Space used (live): 1262977087
>                Space used (total): 1262977087
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 1
>                Read Latency: 0,547 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 200000
>                Key cache size: 0
>                Key cache hit rate: NaN
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>
>                Column Family: RAW
>                SSTable count: 6
>                Space used (live): 18867913004
>                Space used (total): 18867913004
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 1
>                Read Latency: 113685,183 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 200000
>                Key cache size: 3
>                Key cache hit rate: 0.0
>                Row cache: disabled
>                Compacted row minimum size: 215734
>                Compacted row maximum size: 1055255881
>                Compacted row mean size: 491170239
>
>                Column Family: INDEX3
>                SSTable count: 6
>                Space used (live): 1667728231
>                Space used (total): 1667728231
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 1
>                Read Latency: 1,241 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 200000
>                Key cache size: 0
>                Key cache hit rate: NaN
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>
>                Column Family: INDEX4
>                SSTable count: 7
>                Space used (live): 2620328669
>                Space used (total): 2620328669
>                Memtable Columns Count: 0
>                Memtable Data Size: 0
>                Memtable Switch Count: 0
>                Read Count: 1
>                Read Latency: 1,148 ms.
>                Write Count: 0
>                Write Latency: NaN ms.
>                Pending Tasks: 0
>                Key cache capacity: 200000
>                Key cache size: 0
>                Key cache hit rate: NaN
>                Row cache: disabled
>                Compacted row minimum size: 0
>                Compacted row maximum size: 0
>                Compacted row mean size: 0
>

Re: High CPU usage on all nodes without any read or write

Posted by Peter Schuller <pe...@infidyne.com>.
> It runs correctly during several days. Last night, we started to have timeout exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).

Has data been written to the cluster faster than background compaction
is proceeding? If so you may see cassandra eating CPU (and doing I/O)
in the background for extended periods of time even after you stop
sending requests to it.

If this is what is happening it should be visible in the log that it's
doing compaction, and you should see that the data directories contain
lots of files (unless it's just now catching up) rather than the
fairly few expectation when compaction is up to speed.

Also consider that even if you're not writing faster than it can
handle, if you have lots of data in total, the bigger compactions will
take a considerable mount of time so you may see CPU+disk activity for
long periods even if all is otherwise well.

Of course you say your're seeing timeouts. Is is possible these are
timeouts that happen during compaction in general? What kind of
latency are we talking about (a few extra hundre millis or several
seconds?) and is there a correlation between the timeouts and lots of
data being flushed to disk (iostat -x -k 1)?

-- 
/ Peter Schuller