You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Cyril Auburtin <cy...@gmail.com> on 2012/05/29 12:08:41 UTC

About Composite range queries

How is it done in Cassandra to be able to range query on a composite key?

"key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)

like get_range ("key1", start_column=(A,"), end_column=(A, C)); will return
[ (A:B:C), (A:C:C) ] (in pycassa)

I mean does the composite implementation add much overhead to make it work?
Does it need to add other Column families, to be able to range query
between composites simple keys (first, second and third part of the
composite)?

What is the real advantage compared to super column families?

"key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)

thx

Re: About Composite range queries

Posted by Cyril Auburtin <cy...@gmail.com>.
ok sorry I thought columns inside a row had their keys hashed also
So they are just putted as raw bytes

thx

2012/6/1 aaron morton <aa...@thelastpickle.com>

> If you hash 4 composite keys, let's say
> ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
> hashes or you have more?
>
> Four
>
> If it's 4, how come you are able to range query for example between
> start_column=('A', 'D') and end_column=('A','E') and get this column
> ('A','D','C')
>
> That's a slice query against columns, the column value is not hashed. The
> values of the column are sorted according to the comparator which can be
> different to the raw byte order.
>
> A range query is against rows. Rows keys are hashed (using the Random
> Partitioner) to create tokens, and are stored in token order.
>
> the composites are like chapters between the whole keys set, there must be
> intermediate keys added?
>
> Not sure what you mean.
>
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/06/2012, at 12:52 AM, Cyril Auburtin wrote:
>
> but sorry, I don"t undertand
>
> If you hash 4 composite keys, let's say
> ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
> hashes or you have more?
>
> If it's 4, how come you are able to range query for example between
> start_column=('A', 'D') and end_column=('A','E') and get this column
> ('A','D','C')
>
> the composites are like chapters between the whole keys set, there must be
> intermediate keys added?
>
>
> 2012/5/31 aaron morton <aa...@thelastpickle.com>
>
>> it is hashed once.
>>
>> To the partitioner it's just some bytes. Other parts of the code car
>> about it's structure.
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
>>
>> Thx for the answer
>> 1 more thing, a Composite key is not hashed only once I guess?
>> It's hashed the number of part the composite have?
>> So this means there are twice or 3 or ... as many keys as for normal
>> column keys, is it true?
>> Le 31 mai 2012 02:59, "aaron morton" <aa...@thelastpickle.com> a écrit :
>>
>>> Composite Columns compare each part in turn, so the values are ordered
>>> as you've shown them.
>>>
>>> However the rows are not ordered according to key value. They are
>>> ordered using the random token generated by the partitioner see
>>> http://wiki.apache.org/cassandra/FAQ#range_rp
>>>
>>> What is the real advantage compared to super column families?
>>>
>>> They are faster.
>>>
>>> Cheers
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
>>>
>>> How is it done in Cassandra to be able to range query on a composite key?
>>>
>>> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
>>>
>>> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will
>>> return [ (A:B:C), (A:C:C) ] (in pycassa)
>>>
>>> I mean does the composite implementation add much overhead to make it
>>> work?
>>> Does it need to add other Column families, to be able to range query
>>> between composites simple keys (first, second and third part of the
>>> composite)?
>>>
>>> What is the real advantage compared to super column families?
>>>
>>> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
>>>
>>> thx
>>>
>>>
>>>
>>
>
>

Re: About Composite range queries

Posted by aaron morton <aa...@thelastpickle.com>.
> If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
Four

> If it's 4, how come you are able to range query for example between start_column=('A', 'D') and end_column=('A','E') and get this column ('A','D','C')

That's a slice query against columns, the column value is not hashed. The values of the column are sorted according to the comparator which can be different to the raw byte order.

A range query is against rows. Rows keys are hashed (using the Random Partitioner) to create tokens, and are stored in token order. 

> the composites are like chapters between the whole keys set, there must be intermediate keys added?

Not sure what you mean. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 12:52 AM, Cyril Auburtin wrote:

> but sorry, I don"t undertand
> 
> If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
> 
> If it's 4, how come you are able to range query for example between start_column=('A', 'D') and end_column=('A','E') and get this column ('A','D','C')
> 
> the composites are like chapters between the whole keys set, there must be intermediate keys added?
> 
> 
> 2012/5/31 aaron morton <aa...@thelastpickle.com>
> it is hashed once. 
> 
> To the partitioner it's just some bytes. Other parts of the code car about it's structure. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
> 
>> Thx for the answer
>> 1 more thing, a Composite key is not hashed only once I guess?
>> It's hashed the number of part the composite have?
>> So this means there are twice or 3 or ... as many keys as for normal column keys, is it true?
>> 
>> Le 31 mai 2012 02:59, "aaron morton" <aa...@thelastpickle.com> a écrit :
>> Composite Columns compare each part in turn, so the values are ordered as you've shown them. 
>> 
>> However the rows are not ordered according to key value. They are ordered using the random token generated by the partitioner see http://wiki.apache.org/cassandra/FAQ#range_rp
>> 
>>> What is the real advantage compared to super column families?
>> They are faster. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
>> 
>>> How is it done in Cassandra to be able to range query on a composite key?
>>> 
>>> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
>>> 
>>> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will return [ (A:B:C), (A:C:C) ] (in pycassa)
>>> 
>>> I mean does the composite implementation add much overhead to make it work?
>>> Does it need to add other Column families, to be able to range query between composites simple keys (first, second and third part of the composite)?
>>> 
>>> What is the real advantage compared to super column families?
>>> 
>>> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
>>> 
>>> thx
>> 
> 
> 


Re: About Composite range queries

Posted by Cyril Auburtin <cy...@gmail.com>.
but sorry, I don"t undertand

If you hash 4 composite keys, let's say
('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
hashes or you have more?

If it's 4, how come you are able to range query for example between
start_column=('A', 'D') and end_column=('A','E') and get this column
('A','D','C')

the composites are like chapters between the whole keys set, there must be
intermediate keys added?


2012/5/31 aaron morton <aa...@thelastpickle.com>

> it is hashed once.
>
> To the partitioner it's just some bytes. Other parts of the code car about
> it's structure.
>
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
>
> Thx for the answer
> 1 more thing, a Composite key is not hashed only once I guess?
> It's hashed the number of part the composite have?
> So this means there are twice or 3 or ... as many keys as for normal
> column keys, is it true?
> Le 31 mai 2012 02:59, "aaron morton" <aa...@thelastpickle.com> a écrit :
>
>> Composite Columns compare each part in turn, so the values are ordered as
>> you've shown them.
>>
>> However the rows are not ordered according to key value. They are ordered
>> using the random token generated by the partitioner see
>> http://wiki.apache.org/cassandra/FAQ#range_rp
>>
>> What is the real advantage compared to super column families?
>>
>> They are faster.
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
>>
>> How is it done in Cassandra to be able to range query on a composite key?
>>
>> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
>>
>> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will
>> return [ (A:B:C), (A:C:C) ] (in pycassa)
>>
>> I mean does the composite implementation add much overhead to make it
>> work?
>> Does it need to add other Column families, to be able to range query
>> between composites simple keys (first, second and third part of the
>> composite)?
>>
>> What is the real advantage compared to super column families?
>>
>> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
>>
>> thx
>>
>>
>>
>

Re: About Composite range queries

Posted by aaron morton <aa...@thelastpickle.com>.
it is hashed once. 

To the partitioner it's just some bytes. Other parts of the code car about it's structure. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:

> Thx for the answer
> 1 more thing, a Composite key is not hashed only once I guess?
> It's hashed the number of part the composite have?
> So this means there are twice or 3 or ... as many keys as for normal column keys, is it true?
> 
> Le 31 mai 2012 02:59, "aaron morton" <aa...@thelastpickle.com> a écrit :
> Composite Columns compare each part in turn, so the values are ordered as you've shown them. 
> 
> However the rows are not ordered according to key value. They are ordered using the random token generated by the partitioner see http://wiki.apache.org/cassandra/FAQ#range_rp
> 
>> What is the real advantage compared to super column families?
> They are faster. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
> 
>> How is it done in Cassandra to be able to range query on a composite key?
>> 
>> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
>> 
>> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will return [ (A:B:C), (A:C:C) ] (in pycassa)
>> 
>> I mean does the composite implementation add much overhead to make it work?
>> Does it need to add other Column families, to be able to range query between composites simple keys (first, second and third part of the composite)?
>> 
>> What is the real advantage compared to super column families?
>> 
>> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
>> 
>> thx
> 


Re: About Composite range queries

Posted by Cyril Auburtin <cy...@gmail.com>.
Thx for the answer
1 more thing, a Composite key is not hashed only once I guess?
It's hashed the number of part the composite have?
So this means there are twice or 3 or ... as many keys as for normal column
keys, is it true?
Le 31 mai 2012 02:59, "aaron morton" <aa...@thelastpickle.com> a écrit :

> Composite Columns compare each part in turn, so the values are ordered as
> you've shown them.
>
> However the rows are not ordered according to key value. They are ordered
> using the random token generated by the partitioner see
> http://wiki.apache.org/cassandra/FAQ#range_rp
>
> What is the real advantage compared to super column families?
>
> They are faster.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
>
> How is it done in Cassandra to be able to range query on a composite key?
>
> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
>
> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will
> return [ (A:B:C), (A:C:C) ] (in pycassa)
>
> I mean does the composite implementation add much overhead to make it work?
> Does it need to add other Column families, to be able to range query
> between composites simple keys (first, second and third part of the
> composite)?
>
> What is the real advantage compared to super column families?
>
> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
>
> thx
>
>
>

Re: About Composite range queries

Posted by aaron morton <aa...@thelastpickle.com>.
Composite Columns compare each part in turn, so the values are ordered as you've shown them. 

However the rows are not ordered according to key value. They are ordered using the random token generated by the partitioner see http://wiki.apache.org/cassandra/FAQ#range_rp

> What is the real advantage compared to super column families?
They are faster. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:

> How is it done in Cassandra to be able to range query on a composite key?
> 
> "key1" => (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
> 
> like get_range ("key1", start_column=(A,"), end_column=(A, C)); will return [ (A:B:C), (A:C:C) ] (in pycassa)
> 
> I mean does the composite implementation add much overhead to make it work?
> Does it need to add other Column families, to be able to range query between composites simple keys (first, second and third part of the composite)?
> 
> What is the real advantage compared to super column families?
> 
> "key1" => A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
> 
> thx