You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Radim Kolar <hs...@filez.com> on 2012/03/23 09:44:58 UTC
Estimation of memtable size are wrong
I wonder why are memtable estimations so bad.
1. its not possible to run them more often? There should be some limit -
run live/serialized calculation at least once per hour. They took just
few seconds.
2. Why not use data from FlusherWriter to update estimations? Flusher
knows number of ops and serialized size after sstable is written to
disk. These values should be used for updating memtable live/serialized
ratio.
INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='whois',
ColumnFamily='ipbans') (estimated 105363280 bytes)
INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java
(line 704) Enqueuing flush of
Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755
ops)
** Here should be noted that live/serialized size is ESTIMATED!! **
INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line
246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live
bytes, 16755 ops)
INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line
283) Completed flushing
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
Re: Estimation of memtable size are wrong
Posted by aaron morton <aa...@thelastpickle.com>.
> Yes i noticed that. Its not too often, about 1 times per week.
The assumption would be that the workload stabilises over time.
> INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547). calculation took 611ms for 8287 columns
Duh, forgot about the 25% fudge factor. 64 * 1.25 = 80.
It's working as intended. The serialised bytes is the total throughput, which includes overwrites.
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 26/03/2012, at 9:11 PM, Radim Kolar wrote:
> Dne 26.3.2012 0:36, aaron morton napsal(a):
>>> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
>> The live ratio is updated every time the operation count (since startup) for the CF doubles.
> Yes i noticed that. Its not too often, about 1 times per week.
>
>> The ratio here is a strange 105363280 100.48 MB / 1317041 / 1.26 Mb = 80. The live ratio is capped at 64.
>> Can you see any log messages about the live ratio for this CF ?
>
> Last report from problematic CF:
> INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547). calculation took 611ms for 8287 columns
Re: Estimation of memtable size are wrong
Posted by Radim Kolar <hs...@filez.com>.
Dne 26.3.2012 0:36, aaron morton napsal(a):
>> 1. its not possible to run them more often? There should be some
>> limit - run live/serialized calculation at least once per hour. They
>> took just few seconds.
> The live ratio is updated every time the operation count (since
> startup) for the CF doubles.
Yes i noticed that. Its not too often, about 1 times per week.
> The ratio here is a strange 105363280 100.48 MB / 1317041 / 1.26 Mb
> = 80. The live ratio is capped at 64.
> Can you see any log messages about the live ratio for this CF ?
Last report from problematic CF:
INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186)
CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0
(just-counted was 16.354632747474547). calculation took 611ms for 8287
columns
Re: Estimation of memtable size are wrong
Posted by aaron morton <aa...@thelastpickle.com>.
> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
The live ratio is updated every time the operation count (since startup) for the CF doubles.
> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of ops and serialized size after sstable is written to disk. These values should be used for updating memtable live/serialized ratio.
The problem is tracking the live memory usage. Ops count and serialised bytes are tracked by the memtable, not that serialised bytes is the throughput bytes no the amount that will be written to disk.
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
serialised is the serialised by throughput for the memtable, including overwrites.
The ratio here is a strange 105363280 100.48 MB / 1317041 / 1.26 Mb = 80. The live ratio is capped at 64.
Can you see any log messages about the live ratio for this CF ?
> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
Small file may be the result of a lot of overwrites and something odd happening with the live ratio. Is compression on ?
Cheers
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 23/03/2012, at 9:44 PM, Radim Kolar wrote:
> I wonder why are memtable estimations so bad.
>
> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of ops and serialized size after sstable is written to disk. These values should be used for updating memtable live/serialized ratio.
>
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='whois', ColumnFamily='ipbans') (estimated 105363280 bytes)
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
> INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
>