You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by John Laban <jo...@pagerduty.com> on 2012/03/14 01:05:21 UTC

Re: Composite keys and range queries

Forwarding to the Cassandra mailing list as well, in case this is more of
an issue on how I'm using Cassandra.

Am I correct to assume that I can use range queries on composite row keys,
even when using a RandomPartitioner, if I make sure that the first part of
the composite key is fixed?

Any help would be appreciated,
John



On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com> wrote:

> Hi,
>
> I have a column family that uses a composite key:
>
> (ID, priority) -> ...
>
> Where the ID is a UUID and the priority is an integer.
>
> I'm trying to perform a range query now:  I want all the rows where the ID
> matches some fixed UUID, but within a range of priorities.  This is
> supported even if I'm using a RandomPartitioner, right?  (Because the first
> key in the composite key is the partition key, and the second part of the
> composite key is automatically ordered?)
>
> So I perform a range slices query:
>
> val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
>
> rangeQuery.setColumnFamily(RouteColumnFamilyName).
>             setKeys( new Composite(id, priorityStart), new Composite(id, priorityEnd) ).
>             setRange( null, null, false, Int.MaxValue )
>
>
> But I get this error:
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:start key's md5 sorts after end key's md5.  this is not allowed; you probably should not specify end key at all, under RandomPartitioner)
>
>
> Shouldn't they have the same md5, since they have the same partition key?
>
> Am I using the wrong query here, or does Hector not support composte range
> queries, or am I making some mistake in how I think Cassandra's composite
> keys work?
>
> Thanks,
> John
>
>

Re: Composite keys and range queries

Posted by aaron morton <aa...@thelastpickle.com>.

>  is there any disadvantage to using supercolumns here? 
There are some http://wiki.apache.org/cassandra/CassandraLimitations

I would avoid them if you can. The one thing you cannot do when using CompositeTypes for column names is  a range delete. If you delete a super column, then you delete all the sub columns. However if you have a two part column name you cannot delete everything that matches "foo:*"

> They seem a little cleaner and more straightforward for my use case, since I don't have the advantage of the CQL composite key thing.
If they scratch your it's grab the 1.1 beta and give them a try and let us know how they work for you. 
http://cassandra.apache.org/download/

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/03/2012, at 10:23 AM, John Laban wrote:

> Ahhh, ok, I thought that CQL was just being brought up to date with the functionality already built into composite keys, but I guess I was mistaken there.  
> 
> But I guess it's just providing a convenient abstraction, using composite column names under the hood.  That's where I was confused, thanks.
> 
> So, in terms of composite column names vs supercolumns:  is the only advantage to composite column names that you can do column slicing on subsets of the "subcolumns"? I.e. if I don't mind loading all of the subcolumns for a given supercolumn name in memory at once (since I need them all anyway), is there any disadvantage to using supercolumns here?  They seem a little cleaner and more straightforward for my use case, since I don't have the advantage of the CQL composite key thing.
> 
> Thanks,
> John
> 
> 
> On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan <JE...@morningstar.com> wrote:
> Right, so until the new CQL stuff exists to actually query with something smart enough to know about "composite keys" , You have to define and query on your own.
> 
> Row Key = UUID
> Column = CompositeColumn(string, string)
> 
> You want to then use COLUMN slicing, not row ranges to query the data.  Where you slice in priority as the first part of a Composite Column Name.
> 
> See the "Under the hood and historical notes" section of the blog post.  You want to layout your data per the "Physical representation of the denormalized timeline rows" diagram.
> Where your UUID is the "user_id" from the example, and your priority is the "tweet_id"
> 
> -Jeremiah
> 
> 
> From: John Laban [john@pagerduty.com]
> Sent: Wednesday, March 14, 2012 12:37 PM
> To: user@cassandra.apache.org
> Subject: Re: Composite keys and range queries
> 
> Hmm, now I'm really confused.
> 
> > This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
> 
> This article is what I actually used to come up with my schema here.  In the "Clustering, composite keys, and more" section they're using a schema very similarly to how I'm trying to use it.  They define a composite key with two parts, expecting the first part to be used as the partition key and the second part to be used for ordering.
> 
> > The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .
> 
> Why?  Shouldn't only "uuid-1" be used as the partition key?  (So shouldn't those two hash to the same location?)
> 
> I'm thinking of using supercolumns for this instead as I know they'll work (where the row key is the uuid and the supercolumn name is the priority), but aren't composite row keys supposed to essentially replace the need for supercolumns?
> 
> Thanks, and sorry if I'm getting this all wrong,
> John
> 
> 
> 
> On Wed, Mar 14, 2012 at 12:52 AM, aaron morton <aa...@thelastpickle.com> wrote:
> You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
> 
> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .
> 
> You cannot do what you want to. Even if you passed a start of (uuid1,<empty>) and no finish, you would not only get rows where the key starts with uuid1.
> 
> This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
> 
> Or you can store all the priorities that are valid for an ID in another row.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 14/03/2012, at 1:05 PM, John Laban wrote:
> 
> > Forwarding to the Cassandra mailing list as well, in case this is more of an issue on how I'm using Cassandra.
> >
> > Am I correct to assume that I can use range queries on composite row keys, even when using a RandomPartitioner, if I make sure that the first part of the composite key is fixed?
> >
> > Any help would be appreciated,
> > John
> >
> >
> >
> > On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com> wrote:
> > Hi,
> >
> > I have a column family that uses a composite key:
> >
> > (ID, priority) -> ...
> >
> > Where the ID is a UUID and the priority is an integer.
> >
> > I'm trying to perform a range query now:  I want all the rows where the ID matches some fixed UUID, but within a range of priorities.  This is supported even if I'm using a RandomPartitioner, right?  (Because the first key in the composite key is the partition key, and the second part of the composite key is automatically ordered?)
> >
> > So I perform a range slices query:
> >
> > val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> > rangeQuery.setColumnFamily(RouteColumnFamilyName).
> >             setKeys( new Composite(id, priorityStart), new Composite(id, priorityEnd) ).
> >             setRange( null, null, false, Int.MaxValue )
> >
> >
> > But I get this error:
> >
> > me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:start key's md5 sorts after end key's md5.  this is not allowed; you probably should not specify end key at all, under RandomPartitioner)
> >
> > Shouldn't they have the same md5, since they have the same partition key?
> >
> > Am I using the wrong query here, or does Hector not support composte range queries, or am I making some mistake in how I think Cassandra's composite keys work?
> >
> > Thanks,
> > John
> >
> >
> 
> 
>

Re: Composite keys and range queries

Posted by John Laban <jo...@pagerduty.com>.

Ahhh, ok, I thought that CQL was just being brought up to date with
the functionality already built into composite keys, but I guess I was
mistaken there.

But I guess it's just providing a convenient abstraction, using composite
column names under the hood.  That's where I was confused, thanks.

So, in terms of composite column names vs supercolumns:  is the only
advantage to composite column names that you can do column slicing on
subsets of the "subcolumns"? I.e. if I don't mind loading all of the
subcolumns for a given supercolumn name in memory at once (since I need
them all anyway), is there any disadvantage to using supercolumns here?
 They seem a little cleaner and more straightforward for my use case, since
I don't have the advantage of the CQL composite key thing.

Thanks,
John


On Wed, Mar 14, 2012 at 12:53 PM, Jeremiah Jordan <
JEREMIAH.JORDAN@morningstar.com> wrote:

>  Right, so until the new CQL stuff exists to actually query with
> something smart enough to know about "composite keys" , You have to define
> and query on your own.
>
> Row Key = UUID
> Column = CompositeColumn(string, string)
>
> You want to then use COLUMN slicing, not row ranges to query the data.
> Where you slice in priority as the first part of a Composite Column Name.
>
> See the "Under the hood and historical notes" section of the blog post.
> You want to layout your data per the "Physical representation of the
> denormalized timeline rows" diagram.
> Where your UUID is the "user_id" from the example, and your priority is
> the "tweet_id"
>
> -Jeremiah
>
>
>  ------------------------------
> *From:* John Laban [john@pagerduty.com]
> *Sent:* Wednesday, March 14, 2012 12:37 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Composite keys and range queries
>
>   Hmm, now I'm really confused.
>
>  > This may be of use to you
> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
>
>  This article is what I actually used to come up with my schema here.  In
> the "Clustering, composite keys, and more" section they're using a schema
> very similarly to how I'm trying to use it.  They define a composite key
> with two parts, expecting the first part to be used as the partition key
> and the second part to be used for ordering.
>
>  > The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2)
> may be 1 .
>
>  Why?  Shouldn't only "uuid-1" be used as the partition key?  (So
> shouldn't those two hash to the same location?)
>
>  I'm thinking of using supercolumns for this instead as I know they'll
> work (where the row key is the uuid and the supercolumn name is the
> priority), but aren't composite row keys supposed to essentially replace
> the need for supercolumns?
>
>  Thanks, and sorry if I'm getting this all wrong,
> John
>
>
>
> On Wed, Mar 14, 2012 at 12:52 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
>>
>> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may
>> be 1 .
>>
>> You cannot do what you want to. Even if you passed a start of
>> (uuid1,<empty>) and no finish, you would not only get rows where the key
>> starts with uuid1.
>>
>> This may be of use to you
>> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
>>
>> Or you can store all the priorities that are valid for an ID in another
>> row.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 14/03/2012, at 1:05 PM, John Laban wrote:
>>
>> > Forwarding to the Cassandra mailing list as well, in case this is more
>> of an issue on how I'm using Cassandra.
>> >
>> > Am I correct to assume that I can use range queries on composite row
>> keys, even when using a RandomPartitioner, if I make sure that the first
>> part of the composite key is fixed?
>> >
>> > Any help would be appreciated,
>> > John
>> >
>> >
>> >
>> > On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com>
>> wrote:
>> > Hi,
>> >
>> > I have a column family that uses a composite key:
>> >
>> > (ID, priority) -> ...
>> >
>> > Where the ID is a UUID and the priority is an integer.
>> >
>> > I'm trying to perform a range query now:  I want all the rows where the
>> ID matches some fixed UUID, but within a range of priorities.  This is
>> supported even if I'm using a RandomPartitioner, right?  (Because the first
>> key in the composite key is the partition key, and the second part of the
>> composite key is automatically ordered?)
>> >
>> > So I perform a range slices query:
>> >
>> > val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new
>> CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
>> > rangeQuery.setColumnFamily(RouteColumnFamilyName).
>> >             setKeys( new Composite(id, priorityStart), new
>> Composite(id, priorityEnd) ).
>> >             setRange( null, null, false, Int.MaxValue )
>> >
>> >
>> > But I get this error:
>> >
>> > me.prettyprint.hector.api.exceptions.HInvalidRequestException:
>> InvalidRequestException(why:start key's md5 sorts after end key's md5.
>>  this is not allowed; you probably should not specify end key at all, under
>> RandomPartitioner)
>> >
>> > Shouldn't they have the same md5, since they have the same partition
>> key?
>> >
>> > Am I using the wrong query here, or does Hector not support composte
>> range queries, or am I making some mistake in how I think Cassandra's
>> composite keys work?
>> >
>> > Thanks,
>> > John
>> >
>> >
>>
>>
>

RE: Composite keys and range queries

Posted by Jeremiah Jordan <JE...@morningstar.com>.

Right, so until the new CQL stuff exists to actually query with something smart enough to know about "composite keys" , You have to define and query on your own.

Row Key = UUID
Column = CompositeColumn(string, string)

You want to then use COLUMN slicing, not row ranges to query the data.  Where you slice in priority as the first part of a Composite Column Name.

See the "Under the hood and historical notes" section of the blog post.  You want to layout your data per the "Physical representation of the denormalized timeline rows" diagram.
Where your UUID is the "user_id" from the example, and your priority is the "tweet_id"

-Jeremiah

________________________________
From: John Laban [john@pagerduty.com]
Sent: Wednesday, March 14, 2012 12:37 PM
To: user@cassandra.apache.org
Subject: Re: Composite keys and range queries

Hmm, now I'm really confused.

> This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

This article is what I actually used to come up with my schema here.  In the "Clustering, composite keys, and more" section they're using a schema very similarly to how I'm trying to use it.  They define a composite key with two parts, expecting the first part to be used as the partition key and the second part to be used for ordering.

> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .

Why?  Shouldn't only "uuid-1" be used as the partition key?  (So shouldn't those two hash to the same location?)

I'm thinking of using supercolumns for this instead as I know they'll work (where the row key is the uuid and the supercolumn name is the priority), but aren't composite row keys supposed to essentially replace the need for supercolumns?

Thanks, and sorry if I'm getting this all wrong,
John

On Wed, Mar 14, 2012 at 12:52 AM, aaron morton <aa...@thelastpickle.com>> wrote:
You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .

You cannot do what you want to. Even if you passed a start of (uuid1,<empty>) and no finish, you would not only get rows where the key starts with uuid1.

This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Or you can store all the priorities that are valid for an ID in another row.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 1:05 PM, John Laban wrote:

> Forwarding to the Cassandra mailing list as well, in case this is more of an issue on how I'm using Cassandra.
>
> Am I correct to assume that I can use range queries on composite row keys, even when using a RandomPartitioner, if I make sure that the first part of the composite key is fixed?
>
> Any help would be appreciated,
> John
>
>
>
> On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com>> wrote:
> Hi,
>
> I have a column family that uses a composite key:
>
> (ID, priority) -> ...
>
> Where the ID is a UUID and the priority is an integer.
>
> I'm trying to perform a range query now:  I want all the rows where the ID matches some fixed UUID, but within a range of priorities.  This is supported even if I'm using a RandomPartitioner, right?  (Because the first key in the composite key is the partition key, and the second part of the composite key is automatically ordered?)
>
> So I perform a range slices query:
>
> val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> rangeQuery.setColumnFamily(RouteColumnFamilyName).
>             setKeys( new Composite(id, priorityStart), new Composite(id, priorityEnd) ).
>             setRange( null, null, false, Int.MaxValue )
>
>
> But I get this error:
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:start key's md5 sorts after end key's md5.  this is not allowed; you probably should not specify end key at all, under RandomPartitioner)
>
> Shouldn't they have the same md5, since they have the same partition key?
>
> Am I using the wrong query here, or does Hector not support composte range queries, or am I making some mistake in how I think Cassandra's composite keys work?
>
> Thanks,
> John
>
>

Re: Composite keys and range queries

Posted by John Laban <jo...@pagerduty.com>.

Hmm, now I'm really confused.

> This may be of use to you
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

This article is what I actually used to come up with my schema here.  In
the "Clustering, composite keys, and more" section they're using a schema
very similarly to how I'm trying to use it.  They define a composite key
with two parts, expecting the first part to be used as the partition key
and the second part to be used for ordering.

> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may
be 1 .

Why?  Shouldn't only "uuid-1" be used as the partition key?  (So shouldn't
those two hash to the same location?)

I'm thinking of using supercolumns for this instead as I know they'll work
(where the row key is the uuid and the supercolumn name is the priority),
but aren't composite row keys supposed to essentially replace the need for
supercolumns?

Thanks, and sorry if I'm getting this all wrong,
John



On Wed, Mar 14, 2012 at 12:52 AM, aaron morton <aa...@thelastpickle.com>wrote:

> You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
>
> The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be
> 1 .
>
> You cannot do what you want to. Even if you passed a start of
> (uuid1,<empty>) and no finish, you would not only get rows where the key
> starts with uuid1.
>
> This may be of use to you
> http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
>
> Or you can store all the priorities that are valid for an ID in another
> row.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 14/03/2012, at 1:05 PM, John Laban wrote:
>
> > Forwarding to the Cassandra mailing list as well, in case this is more
> of an issue on how I'm using Cassandra.
> >
> > Am I correct to assume that I can use range queries on composite row
> keys, even when using a RandomPartitioner, if I make sure that the first
> part of the composite key is fixed?
> >
> > Any help would be appreciated,
> > John
> >
> >
> >
> > On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com> wrote:
> > Hi,
> >
> > I have a column family that uses a composite key:
> >
> > (ID, priority) -> ...
> >
> > Where the ID is a UUID and the priority is an integer.
> >
> > I'm trying to perform a range query now:  I want all the rows where the
> ID matches some fixed UUID, but within a range of priorities.  This is
> supported even if I'm using a RandomPartitioner, right?  (Because the first
> key in the composite key is the partition key, and the second part of the
> composite key is automatically ordered?)
> >
> > So I perform a range slices query:
> >
> > val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new
> CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> > rangeQuery.setColumnFamily(RouteColumnFamilyName).
> >             setKeys( new Composite(id, priorityStart), new Composite(id,
> priorityEnd) ).
> >             setRange( null, null, false, Int.MaxValue )
> >
> >
> > But I get this error:
> >
> > me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:start key's md5 sorts after end key's md5.
>  this is not allowed; you probably should not specify end key at all, under
> RandomPartitioner)
> >
> > Shouldn't they have the same md5, since they have the same partition key?
> >
> > Am I using the wrong query here, or does Hector not support composte
> range queries, or am I making some mistake in how I think Cassandra's
> composite keys work?
> >
> > Thanks,
> > John
> >
> >
>
>

Re: Composite keys and range queries

Posted by aaron morton <aa...@thelastpickle.com>.

You are seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

The hash for (uuid-1 , p1) may be 100 and the hash for (uuid-1, p2) may be 1 .

You cannot do what you want to. Even if you passed a start of (uuid1,<empty>) and no finish, you would not only get rows where the key starts with uuid1. 

This may be of use to you http://www.datastax.com/dev/blog/schema-in-cassandra-1-1

Or you can store all the priorities that are valid for an ID in another row. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/03/2012, at 1:05 PM, John Laban wrote:

> Forwarding to the Cassandra mailing list as well, in case this is more of an issue on how I'm using Cassandra.
> 
> Am I correct to assume that I can use range queries on composite row keys, even when using a RandomPartitioner, if I make sure that the first part of the composite key is fixed?
> 
> Any help would be appreciated,
> John
> 
> 
> 
> On Tue, Mar 13, 2012 at 12:15 PM, John Laban <jo...@pagerduty.com> wrote:
> Hi,
> 
> I have a column family that uses a composite key:
> 
> (ID, priority) -> ...
> 
> Where the ID is a UUID and the priority is an integer.
> 
> I'm trying to perform a range query now:  I want all the rows where the ID matches some fixed UUID, but within a range of priorities.  This is supported even if I'm using a RandomPartitioner, right?  (Because the first key in the composite key is the partition key, and the second part of the composite key is automatically ordered?)
> 
> So I perform a range slices query:
> 
> val rangeQuery = HFactory.createRangeSlicesQuery(keyspace, new CompositeSerializer, StringSerializer.get, BytesArraySerializer.get)
> rangeQuery.setColumnFamily(RouteColumnFamilyName).
>             setKeys( new Composite(id, priorityStart), new Composite(id, priorityEnd) ).
>             setRange( null, null, false, Int.MaxValue )
> 
> 
> But I get this error:
> 
> me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:start key's md5 sorts after end key's md5.  this is not allowed; you probably should not specify end key at all, under RandomPartitioner)
> 
> Shouldn't they have the same md5, since they have the same partition key?  
> 
> Am I using the wrong query here, or does Hector not support composte range queries, or am I making some mistake in how I think Cassandra's composite keys work?
> 
> Thanks,
> John
> 
>