You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Maxim Veksler <ma...@vekslers.org> on 2011/01/20 00:07:16 UTC

Does Cassandra support range queries on keys ?

Hello everyone,

I'm new to dynamo. I'm looking to implement something similar to prefix
search for keys (much like S3 allows you to list all the keys that match a
certain prefix).

Can I implement this with Cassandra? I'm using Hector as the client but
would gladly go thrift is necessary.


Thank you,
Maxim.

Re: Does Cassandra support range queries on keys ?

Posted by Aaron Morton <aa...@thelastpickle.com>.

I found this on the wiki, may be useful http://wiki.apache.org/cassandra/LargeDataSetConsiderations

Aaron

On 24 Jan, 2011,at 09:26 PM, Peter Schuller <pe...@infidyne.com> wrote:

> Following your suggestions, of using key of super column as range token
> won't I have a storage problem?

You won't get me to proclaim that you won't have a storage problem ;)

If you're going to deploy this at scale, I'm sure you'll have problems
whatever you do...

> I couldn't find information about this so I'll just ask: If I have a
> (Super/)ColumnFamily that contains 1 "key" for the row but that row contains
> millions of k:v entries. Would that be a efficient Cassandra design?
> Does cassandra store a CF row on a single now or can it / should it
> distribute this data?
> Does having millions of k:v entries in a single row of a CF would be
> considered a good practice? (in terms of query time, range scans and co ?)

The replication set/distribution is on a per-row basis, so you
generally don't want individual rows to be a significant part of the
entire data set.

You definitely don't want super columns that are huge; individual
super column's columns aren't indexed on disk, for one thing.

Having large rows with lots of columns... maybe. In general it's
certainly supported, but the overall impact if you're intended to have
relatively few rows all being very large - I don't want to say too
much here. Anyone else? (anti-entropy granularity, compaction
in-memory thresholds and GC tweaking, etc)

-- 
/ Peter Schuller

Re: Does Cassandra support range queries on keys ?

Posted by Peter Schuller <pe...@infidyne.com>.

> Following your suggestions, of using key of super column as range token
> won't I have a storage problem?

You won't get me to proclaim that you won't have a storage problem ;)

If you're going to deploy this at scale, I'm sure you'll have problems
whatever you do...

> I couldn't find information about this so I'll just ask: If I have a
> (Super/)ColumnFamily that contains 1 "key" for the row but that row contains
> millions of k:v entries. Would that be a efficient Cassandra design?
> Does cassandra store a CF row on a single now or can it / should it
> distribute this data?
> Does having millions of k:v entries in a single row of a CF would be
> considered a good practice? (in terms of query time, range scans and co ?)

The replication set/distribution is on a per-row basis, so you
generally don't want individual rows to be a significant part of the
entire data set.

You definitely don't want super columns that are huge; individual
super column's columns aren't indexed on disk, for one thing.

Having large rows with lots of columns... maybe. In general it's
certainly supported, but the overall impact if you're intended to have
relatively few rows all being very large - I don't want to say too
much here. Anyone else? (anti-entropy granularity, compaction
in-memory thresholds and GC tweaking, etc)

-- 
/ Peter Schuller

Re: Does Cassandra support range queries on keys ?

Posted by Maxim Veksler <ma...@vekslers.org>.

Thank you Peter.

Following your suggestions, of using key of super column as range token
won't I have a storage problem?

I couldn't find information about this so I'll just ask: If I have a
(Super/)ColumnFamily that contains 1 "key" for the row but that row contains
millions of k:v entries. Would that be a efficient Cassandra design?

Does cassandra store a CF row on a single now or can it / should it
distribute this data?
Does having millions of k:v entries in a single row of a CF would be
considered a good practice? (in terms of query time, range scans and co ?)

Thank you,
Maxim.

On Sun, Jan 23, 2011 at 11:14 AM, Peter Schuller <
peter.schuller@infidyne.com> wrote:

> > I'm new to dynamo. I'm looking to implement something similar to prefix
> > search for keys (much like S3 allows you to list all the keys that match
> a
> > certain prefix).
> > Can I implement this with Cassandra? I'm using Hector as the client but
> > would gladly go thrift is necessary.
>
> Range queries on keys is possible when using the order preserving
> partitioner; see the partitioner section of:
>
>  http://wiki.apache.org/cassandra/StorageConfiguration
>
> In addition, range slicing is supported within a single row, based on
> column names, independently of which partitioner is used. Depending on
> whether your data is such that you can suitably put whatever is to be
> range queried over into a single row, that may allow you to do this
> without using the order preserving partitioner (which has some
> downsides in that ring balancing becomes more difficult).
>
> --
> / Peter Schuller
>

Re: Does Cassandra support range queries on keys ?

Posted by Peter Schuller <pe...@infidyne.com>.

> I'm new to dynamo. I'm looking to implement something similar to prefix
> search for keys (much like S3 allows you to list all the keys that match a
> certain prefix).
> Can I implement this with Cassandra? I'm using Hector as the client but
> would gladly go thrift is necessary.

Range queries on keys is possible when using the order preserving
partitioner; see the partitioner section of:

  http://wiki.apache.org/cassandra/StorageConfiguration

In addition, range slicing is supported within a single row, based on
column names, independently of which partitioner is used. Depending on
whether your data is such that you can suitably put whatever is to be
range queried over into a single row, that may allow you to do this
without using the order preserving partitioner (which has some
downsides in that ring balancing becomes more difficult).

-- 
/ Peter Schuller