You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Luke Jolly <lu...@getadmiral.com> on 2016/07/27 18:44:30 UTC

Approximate row count

I have a table that I'm storing ad impression data in with every row being
an impression.  I want to get a count of total rows / impressions.  I know
that there is in the ball park of 200-400 million rows in this table and
from my reading "Number of keys" in the output of cfstats should be a
reasonably accurate estimate. However, it is 39434. Am I misunderstanding
something? Every node in my cluster has a complete copy of the keyspace.


		Table: impressions_2
		SSTable count: 22
		Space used (live): 51255709817
		Space used (total): 51255709817
		Space used by snapshots (total): 49415721741
		Off heap memory used (total): 30824975
		SSTable Compression Ratio: 0.20347134631246266
		Number of keys (estimate): 39434
		Memtable cell count: 18279
		Memtable data size: 15897457
		Memtable off heap memory used: 0
		Memtable switch count: 1294
		Local read count: 347016
		Local read latency: 12.573 ms
		Local write count: 109226238
		Local write latency: 0.023 ms
		Pending flushes: 0
		Bloom filter false positives: 655
		Bloom filter false ratio: 0.00000
		Bloom filter space used: 97552
		Bloom filter off heap memory used: 97376
		Index summary off heap memory used: 26719
		Compression metadata off heap memory used: 30700880
		Compacted partition minimum bytes: 311
		Compacted partition maximum bytes: 386857368
		Compacted partition mean bytes: 6424107
		Average live cells per slice (last five minutes): 1027.9502011434631
		Maximum live cells per slice (last five minutes): 5722
		Average tombstones per slice (last five minutes): 1.0
		Maximum tombstones per slice (last five minutes): 1

Re: Approximate row count

Posted by Luke Jolly <lu...@getadmiral.com>.
Is there any other way to get an estimate of rows?

On Wed, Jul 27, 2016 at 2:49 PM Chris Lohfink <cl...@gmail.com> wrote:

> the number of keys are the number of *partition keys, *not row keys. You
> have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you
> have some wide partitions that contain many of your rows.
>
> Chris Lohfink
>
> On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <lu...@getadmiral.com> wrote:
>
>> I have a table that I'm storing ad impression data in with every row
>> being an impression.  I want to get a count of total rows / impressions.  I
>> know that there is in the ball park of 200-400 million rows in this
>> table and from my reading "Number of keys" in the output of cfstats
>> should be a reasonably accurate estimate. However, it is 39434. Am I
>> misunderstanding something? Every node in my cluster has a complete copy of
>> the keyspace.
>>
>>
>> 		Table: impressions_2
>> 		SSTable count: 22
>> 		Space used (live): 51255709817
>> 		Space used (total): 51255709817
>> 		Space used by snapshots (total): 49415721741
>> 		Off heap memory used (total): 30824975
>> 		SSTable Compression Ratio: 0.20347134631246266
>> 		Number of keys (estimate): 39434
>> 		Memtable cell count: 18279
>> 		Memtable data size: 15897457
>> 		Memtable off heap memory used: 0
>> 		Memtable switch count: 1294
>> 		Local read count: 347016
>> 		Local read latency: 12.573 ms
>> 		Local write count: 109226238
>> 		Local write latency: 0.023 ms
>> 		Pending flushes: 0
>> 		Bloom filter false positives: 655
>> 		Bloom filter false ratio: 0.00000
>> 		Bloom filter space used: 97552
>> 		Bloom filter off heap memory used: 97376
>> 		Index summary off heap memory used: 26719
>> 		Compression metadata off heap memory used: 30700880
>> 		Compacted partition minimum bytes: 311
>> 		Compacted partition maximum bytes: 386857368
>> 		Compacted partition mean bytes: 6424107
>> 		Average live cells per slice (last five minutes): 1027.9502011434631
>> 		Maximum live cells per slice (last five minutes): 5722
>> 		Average tombstones per slice (last five minutes): 1.0
>> 		Maximum tombstones per slice (last five minutes): 1
>>
>>
>

Re: Approximate row count

Posted by Chris Lohfink <cl...@gmail.com>.
the number of keys are the number of *partition keys, *not row keys. You
have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you
have some wide partitions that contain many of your rows.

Chris Lohfink

On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <lu...@getadmiral.com> wrote:

> I have a table that I'm storing ad impression data in with every row being
> an impression.  I want to get a count of total rows / impressions.  I know
> that there is in the ball park of 200-400 million rows in this table and
> from my reading "Number of keys" in the output of cfstats should be a
> reasonably accurate estimate. However, it is 39434. Am I misunderstanding
> something? Every node in my cluster has a complete copy of the keyspace.
>
>
> 		Table: impressions_2
> 		SSTable count: 22
> 		Space used (live): 51255709817
> 		Space used (total): 51255709817
> 		Space used by snapshots (total): 49415721741
> 		Off heap memory used (total): 30824975
> 		SSTable Compression Ratio: 0.20347134631246266
> 		Number of keys (estimate): 39434
> 		Memtable cell count: 18279
> 		Memtable data size: 15897457
> 		Memtable off heap memory used: 0
> 		Memtable switch count: 1294
> 		Local read count: 347016
> 		Local read latency: 12.573 ms
> 		Local write count: 109226238
> 		Local write latency: 0.023 ms
> 		Pending flushes: 0
> 		Bloom filter false positives: 655
> 		Bloom filter false ratio: 0.00000
> 		Bloom filter space used: 97552
> 		Bloom filter off heap memory used: 97376
> 		Index summary off heap memory used: 26719
> 		Compression metadata off heap memory used: 30700880
> 		Compacted partition minimum bytes: 311
> 		Compacted partition maximum bytes: 386857368
> 		Compacted partition mean bytes: 6424107
> 		Average live cells per slice (last five minutes): 1027.9502011434631
> 		Maximum live cells per slice (last five minutes): 5722
> 		Average tombstones per slice (last five minutes): 1.0
> 		Maximum tombstones per slice (last five minutes): 1
>
>