You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Luke Jolly <lu...@getadmiral.com> on 2016/07/27 18:44:30 UTC
Approximate row count
I have a table that I'm storing ad impression data in with every row being
an impression. I want to get a count of total rows / impressions. I know
that there is in the ball park of 200-400 million rows in this table and
from my reading "Number of keys" in the output of cfstats should be a
reasonably accurate estimate. However, it is 39434. Am I misunderstanding
something? Every node in my cluster has a complete copy of the keyspace.
Table: impressions_2
SSTable count: 22
Space used (live): 51255709817
Space used (total): 51255709817
Space used by snapshots (total): 49415721741
Off heap memory used (total): 30824975
SSTable Compression Ratio: 0.20347134631246266
Number of keys (estimate): 39434
Memtable cell count: 18279
Memtable data size: 15897457
Memtable off heap memory used: 0
Memtable switch count: 1294
Local read count: 347016
Local read latency: 12.573 ms
Local write count: 109226238
Local write latency: 0.023 ms
Pending flushes: 0
Bloom filter false positives: 655
Bloom filter false ratio: 0.00000
Bloom filter space used: 97552
Bloom filter off heap memory used: 97376
Index summary off heap memory used: 26719
Compression metadata off heap memory used: 30700880
Compacted partition minimum bytes: 311
Compacted partition maximum bytes: 386857368
Compacted partition mean bytes: 6424107
Average live cells per slice (last five minutes): 1027.9502011434631
Maximum live cells per slice (last five minutes): 5722
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Re: Approximate row count
Posted by Luke Jolly <lu...@getadmiral.com>.
Is there any other way to get an estimate of rows?
On Wed, Jul 27, 2016 at 2:49 PM Chris Lohfink <cl...@gmail.com> wrote:
> the number of keys are the number of *partition keys, *not row keys. You
> have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you
> have some wide partitions that contain many of your rows.
>
> Chris Lohfink
>
> On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <lu...@getadmiral.com> wrote:
>
>> I have a table that I'm storing ad impression data in with every row
>> being an impression. I want to get a count of total rows / impressions. I
>> know that there is in the ball park of 200-400 million rows in this
>> table and from my reading "Number of keys" in the output of cfstats
>> should be a reasonably accurate estimate. However, it is 39434. Am I
>> misunderstanding something? Every node in my cluster has a complete copy of
>> the keyspace.
>>
>>
>> Table: impressions_2
>> SSTable count: 22
>> Space used (live): 51255709817
>> Space used (total): 51255709817
>> Space used by snapshots (total): 49415721741
>> Off heap memory used (total): 30824975
>> SSTable Compression Ratio: 0.20347134631246266
>> Number of keys (estimate): 39434
>> Memtable cell count: 18279
>> Memtable data size: 15897457
>> Memtable off heap memory used: 0
>> Memtable switch count: 1294
>> Local read count: 347016
>> Local read latency: 12.573 ms
>> Local write count: 109226238
>> Local write latency: 0.023 ms
>> Pending flushes: 0
>> Bloom filter false positives: 655
>> Bloom filter false ratio: 0.00000
>> Bloom filter space used: 97552
>> Bloom filter off heap memory used: 97376
>> Index summary off heap memory used: 26719
>> Compression metadata off heap memory used: 30700880
>> Compacted partition minimum bytes: 311
>> Compacted partition maximum bytes: 386857368
>> Compacted partition mean bytes: 6424107
>> Average live cells per slice (last five minutes): 1027.9502011434631
>> Maximum live cells per slice (last five minutes): 5722
>> Average tombstones per slice (last five minutes): 1.0
>> Maximum tombstones per slice (last five minutes): 1
>>
>>
>
Re: Approximate row count
Posted by Chris Lohfink <cl...@gmail.com>.
the number of keys are the number of *partition keys, *not row keys. You
have ~39434 partitions, ranging from 311 bytes to 386mb. Looks like you
have some wide partitions that contain many of your rows.
Chris Lohfink
On Wed, Jul 27, 2016 at 1:44 PM, Luke Jolly <lu...@getadmiral.com> wrote:
> I have a table that I'm storing ad impression data in with every row being
> an impression. I want to get a count of total rows / impressions. I know
> that there is in the ball park of 200-400 million rows in this table and
> from my reading "Number of keys" in the output of cfstats should be a
> reasonably accurate estimate. However, it is 39434. Am I misunderstanding
> something? Every node in my cluster has a complete copy of the keyspace.
>
>
> Table: impressions_2
> SSTable count: 22
> Space used (live): 51255709817
> Space used (total): 51255709817
> Space used by snapshots (total): 49415721741
> Off heap memory used (total): 30824975
> SSTable Compression Ratio: 0.20347134631246266
> Number of keys (estimate): 39434
> Memtable cell count: 18279
> Memtable data size: 15897457
> Memtable off heap memory used: 0
> Memtable switch count: 1294
> Local read count: 347016
> Local read latency: 12.573 ms
> Local write count: 109226238
> Local write latency: 0.023 ms
> Pending flushes: 0
> Bloom filter false positives: 655
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 97552
> Bloom filter off heap memory used: 97376
> Index summary off heap memory used: 26719
> Compression metadata off heap memory used: 30700880
> Compacted partition minimum bytes: 311
> Compacted partition maximum bytes: 386857368
> Compacted partition mean bytes: 6424107
> Average live cells per slice (last five minutes): 1027.9502011434631
> Maximum live cells per slice (last five minutes): 5722
> Average tombstones per slice (last five minutes): 1.0
> Maximum tombstones per slice (last five minutes): 1
>
>