You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Radim Kolar <hs...@filez.com> on 2012/03/23 09:44:58 UTC

Estimation of memtable size are wrong

I wonder why are memtable estimations so bad.

1. its not possible to run them more often? There should be some limit - 
run live/serialized calculation at least once per hour. They took just 
few seconds.
2. Why not use data from FlusherWriter to update estimations? Flusher 
knows number of ops and serialized size after sstable is written to 
disk. These values should be used for updating memtable live/serialized 
ratio.

  INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java 
(line 62) flushing high-traffic column family CFS(Keyspace='whois', 
ColumnFamily='ipbans') (estimated 105363280 bytes)
  INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java 
(line 704) Enqueuing flush of 
Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 
ops)
  ** Here should be noted that live/serialized size is ESTIMATED!! **
  INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 
246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live 
bytes, 16755 ops)
  INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 
283) Completed flushing 
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)


Re: Estimation of memtable size are wrong

Posted by aaron morton <aa...@thelastpickle.com>.
> Yes i noticed that. Its not too often, about 1 times per week.
The assumption would be that the workload stabilises over time.

> INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547).  calculation took 611ms for 8287 columns
Duh, forgot about the 25% fudge factor. 64 * 1.25 = 80. 

It's working as intended. The serialised bytes is the total throughput, which includes overwrites. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/03/2012, at 9:11 PM, Radim Kolar wrote:

> Dne 26.3.2012 0:36, aaron morton napsal(a):
>>> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
>> The live ratio is updated every time the operation count (since startup) for the CF doubles.
> Yes i noticed that. Its not too often, about 1 times per week.
> 
>> The ratio here is a strange 105363280 100.48 MB /  1317041 / 1.26 Mb  = 80. The live ratio is capped at 64.
>> Can you see any log messages about the live ratio for this CF ?
> 
> Last report from problematic CF:
> INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 (just-counted was 16.354632747474547).  calculation took 611ms for 8287 columns


Re: Estimation of memtable size are wrong

Posted by Radim Kolar <hs...@filez.com>.
Dne 26.3.2012 0:36, aaron morton napsal(a):
>> 1. its not possible to run them more often? There should be some 
>> limit - run live/serialized calculation at least once per hour. They 
>> took just few seconds.
> The live ratio is updated every time the operation count (since 
> startup) for the CF doubles.
Yes i noticed that. Its not too often, about 1 times per week.

> The ratio here is a strange 105363280 100.48 MB /  1317041 / 1.26 Mb 
>  = 80. The live ratio is capped at 64.
> Can you see any log messages about the live ratio for this CF ?

Last report from problematic CF:
  INFO [MemoryMeter:1] 2012-03-23 00:00:18,407 Memtable.java (line 186) 
CFS(Keyspace='whois', ColumnFamily='ipbans') liveRatio is 64.0 
(just-counted was 16.354632747474547).  calculation took 611ms for 8287 
columns

Re: Estimation of memtable size are wrong

Posted by aaron morton <aa...@thelastpickle.com>.
> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
The live ratio is updated every time the operation count (since startup) for the CF doubles. 

> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of ops and serialized size after sstable is written to disk. These values should be used for updating memtable live/serialized ratio.
The problem is tracking the live memory usage. Ops count and serialised bytes are tracked by the memtable, not that serialised bytes is the throughput bytes no the amount that will be written to disk.  

> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
serialised is the serialised by throughput for the memtable, including overwrites. 

The ratio here is a strange 105363280 100.48 MB /  1317041 / 1.26 Mb  = 80. The live ratio is capped at 64. 
Can you see any log messages about the live ratio for this CF ? 

> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
Small file may be the result of a lot of overwrites and something odd happening with the live ratio. Is compression on ? 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/03/2012, at 9:44 PM, Radim Kolar wrote:

> I wonder why are memtable estimations so bad.
> 
> 1. its not possible to run them more often? There should be some limit - run live/serialized calculation at least once per hour. They took just few seconds.
> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of ops and serialized size after sstable is written to disk. These values should be used for updating memtable live/serialized ratio.
> 
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='whois', ColumnFamily='ipbans') (estimated 105363280 bytes)
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
> INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing /var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
>