You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jack Krupansky <ja...@gmail.com> on 2016/01/25 01:32:56 UTC

Estimated key count from nodetool tablestats

Does the nodetool tablestats output line for "Number of keys (estimate)"
indicate partition keys or CQL row primary keys (PK)?

We currently don't have doc on this and I couldn't get a solid answer from
a quick examination of the code.

Since it is an estimate, roughly what is the nature of the estimation?

In particular, for a very wide partition with many CQL rows (even millions)
is it estimating that as roughly one key or will the number of sstables
that the partition spans make it a large number?

Thanks.

-- Jack Krupansky

Re: Estimated key count from nodetool tablestats

Posted by Chris Lohfink <cl...@gmail.com>.
It will give you an estimate of the number of partition keys.  In newer
versions it will merge a sketch of the keys and using HyperLogLog++
<http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/40671.pdf>
(p=13,
sp=25) it will come up with an estimate of the cardinality. I would say its
safe to assume that its 2-ish% of the actual value. That does not include
the memtable data however so thats added on top. So things in both memtable
and sstables will be double counted. It should still be a fair estimate.

Before 2.1.6 it used the index and could be off by a lot in wide
rows/updated/many sstable use cases.

---
Chris Lohfink

On Sun, Jan 24, 2016 at 6:32 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> Does the nodetool tablestats output line for "Number of keys (estimate)"
> indicate partition keys or CQL row primary keys (PK)?
>
> We currently don't have doc on this and I couldn't get a solid answer from
> a quick examination of the code.
>
> Since it is an estimate, roughly what is the nature of the estimation?
>
> In particular, for a very wide partition with many CQL rows (even
> millions) is it estimating that as roughly one key or will the number of
> sstables that the partition spans make it a large number?
>
> Thanks.
>
> -- Jack Krupansky
>