You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Avi Levi <av...@indeni.com> on 2017/08/18 13:03:42 UTC

Getting all unique keys

Hi

what is the most efficient way to get a distinct key list from a big table
(aprox 20 mil inserts per minute) ?

equivalent to *select distinct key from my_table *for this table

*CREATE TABLE my_table (*

*    key text,*

*    timestamp bigint,*

*    value double,*

*    PRIMARY KEY (key, timestamp) )*

I need to execute this query quite often ( every couple of minutes )

I can of course maintain a table to hold only unique set of keys but this
is of course error prone so I rather avoid it. but it's an option.

Cheers

Avi

Re: Getting all unique keys

Posted by kurt greaves <ku...@instaclustr.com>.

You can SELECT DISTINCT in CQL, however I would recommend against such a
pattern as it is very unlikely to be efficient, and prone to errors. A
distinct query will search every partition for the first live cell, which
could be buried behind a lot of tombstones. It's safe to say at some point
you will run into serious issues. Selecting all the keys in a table purely
to see what exists is not going to be cheap, and sounds awfully like an
anti-pattern. Why do you need this behaviour?

Re: Getting all unique keys

Posted by Sruti S <sr...@gmail.com>.

hi:

Is this sensor data, hence timestamp? Ho w are you generating this 'key'
field?Can you have only the 'key' field as primary key? Even if not, since
that field is a part of the PK may make such queries fast.

However,  are there other attributes thst can be added that define unique
business keys? If so you can make those attributes primary keys so an index
is used.
It would help to understand why the list if distinct keys is needed, and at
this constant query rate.

Do you need to ensure the same key is not reused? If so, make your key
field a PK or add a uniqueness constraint, so duplicate inserts will fail.

If you *must* query constantly, u can also consider using a secondary index
to help speed up.
HTH.
Look forward to further clarification.

Friday, August 18, 2017, Avi Levi <av...@indeni.com> wrote:

> Hi
>
> what is the most efficient way to get a distinct key list from a big table
> (aprox 20 mil inserts per minute) ?
>
> equivalent to *select distinct key from my_table *for this table
>
> *CREATE TABLE my_table (*
>
> *    key text,*
>
> *    timestamp bigint,*
>
> *    value double,*
>
> *    PRIMARY KEY (key, timestamp) )*
>
> I need to execute this query quite often ( every couple of minutes )
>
> I can of course maintain a table to hold only unique set of keys but this
> is of course error prone so I rather avoid it. but it's an option.
>
> Cheers
>
> Avi
>