You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Adam Fisk <a...@littleshoot.org> on 2009/12/16 04:35:21 UTC

Re: date range queries

This is still baffling me a bit, Jonathan. I'm running from a trunk
snapshot from a few days ago (2009-12-07).

I'm creating time UUIDs using "JUG" (http://jug.safehaus.org/) as my
SuperColumn key names for batch_insert calls. That seems excessive but
works - the only Java lib I found for creating time-based UUIDs. I'm
successfully calling batch_insert using those UUID bytes as my
SuperColumn names. I think this is correct, but please correct me if
not.

Then, however, I can't figure out how to get any more than data for a
single SuperColumn from get_slice. It seems the way to do this would
be to specify a ColumnParent that only contains a ColumnFamily. When I
do this, however, I get the following exception:

InvalidRequestException(why:UUIDs must be exactly 16 bytes)
	at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandra.java:3170)
	at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandra.java:170)
	at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.java:144)

If I specify a SuperColumn name in the ColumnParent, I can get Columns
for just that SuperColumn, but I'm attempting to get a slice of
SuperColumns. Am I misunderstanding something? Is that a bug?

Thanks very much.

-Adam

On Wed, Nov 18, 2009 at 8:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> The easiest is to store the messages in a row with timeuuid column
> names.  Then you can just use get_slice in either forward or reverse
> order.
>
> On Wed, Nov 18, 2009 at 6:07 PM, Adam Fisk <a...@littleshoot.org> wrote:
>> First off, very impressive project -- thanks for everyone's hard work!
>> I'm wondering how I would do date range queries in Cassandra, say for
>> all messages for a given user in the last week.
>>
>> Can someone provide an example?
>>
>> Thanks so much.
>>
>> -Adam
>>
>> --
>> Adam Fisk
>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>> http://twitter.com/adamfisk
>>
>

-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk

Re: date range queries

Posted by Jonathan Ellis <jb...@gmail.com>.

Sure.

On Wed, Dec 16, 2009 at 7:12 PM, Adam Fisk <a...@littleshoot.org> wrote:
> Definitely. Against 0.9 right?
>
> On Wed, Dec 16, 2009 at 10:29 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Wed, Dec 16, 2009 at 12:15 PM, Adam Fisk <a...@littleshoot.org> wrote:
>>> That call made sense for columns, but not for SuperColumns of course,
>>> and that was the culprit. Ideally the various slice calls would do a
>>> quick check to make sure both the SlicePredicate and the ColumnParent
>>> are either both referring to SuperColumns or are both referring to
>>> Columns and fail fast if not, but now I'm just nitpicking.
>>
>> That is a good suggestion.  Can you create a ticket?
>>
>> -Jonathan
>>
>
>
>
> --
> Adam Fisk
> http://www.littleshoot.org | http://adamfisk.wordpress.com |
> http://twitter.com/adamfisk
>

Re: date range queries

Posted by Adam Fisk <a...@littleshoot.org>.

Definitely. Against 0.9 right?

On Wed, Dec 16, 2009 at 10:29 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Wed, Dec 16, 2009 at 12:15 PM, Adam Fisk <a...@littleshoot.org> wrote:
>> That call made sense for columns, but not for SuperColumns of course,
>> and that was the culprit. Ideally the various slice calls would do a
>> quick check to make sure both the SlicePredicate and the ColumnParent
>> are either both referring to SuperColumns or are both referring to
>> Columns and fail fast if not, but now I'm just nitpicking.
>
> That is a good suggestion.  Can you create a ticket?
>
> -Jonathan
>



-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk

Re: date range queries

Posted by Jonathan Ellis <jb...@gmail.com>.

You can either write a TimeUUID Partitioner, if that is the only kind
of key you have in your cluster, or use a different partitioner and
prefix the keys w/ a date in a format that sorts correctly in whatever
collation you are using, e.g. ISO 8601 for our standard
OrderedPartitioner.

On Wed, Dec 16, 2009 at 1:08 PM, Richard Grossman <ri...@gmail.com> wrote:
> I've the same requirement but in my case the date is the key of the CF so
> how to use the timeUUID if the date is the key and not a column ??
>
>
> On Wed, Dec 16, 2009 at 8:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Wed, Dec 16, 2009 at 12:15 PM, Adam Fisk <a...@littleshoot.org> wrote:
>> > That call made sense for columns, but not for SuperColumns of course,
>> > and that was the culprit. Ideally the various slice calls would do a
>> > quick check to make sure both the SlicePredicate and the ColumnParent
>> > are either both referring to SuperColumns or are both referring to
>> > Columns and fail fast if not, but now I'm just nitpicking.
>>
>> That is a good suggestion.  Can you create a ticket?
>>
>> -Jonathan
>
>

Re: date range queries

Posted by Richard Grossman <ri...@gmail.com>.

I've the same requirement but in my case the date is the key of the CF so
how to use the timeUUID if the date is the key and not a column ??


On Wed, Dec 16, 2009 at 8:29 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Wed, Dec 16, 2009 at 12:15 PM, Adam Fisk <a...@littleshoot.org> wrote:
> > That call made sense for columns, but not for SuperColumns of course,
> > and that was the culprit. Ideally the various slice calls would do a
> > quick check to make sure both the SlicePredicate and the ColumnParent
> > are either both referring to SuperColumns or are both referring to
> > Columns and fail fast if not, but now I'm just nitpicking.
>
> That is a good suggestion.  Can you create a ticket?
>
> -Jonathan
>

Re: date range queries

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Dec 16, 2009 at 12:15 PM, Adam Fisk <a...@littleshoot.org> wrote:
> That call made sense for columns, but not for SuperColumns of course,
> and that was the culprit. Ideally the various slice calls would do a
> quick check to make sure both the SlicePredicate and the ColumnParent
> are either both referring to SuperColumns or are both referring to
> Columns and fail fast if not, but now I'm just nitpicking.

That is a good suggestion.  Can you create a ticket?

-Jonathan

Re: date range queries

Posted by Adam Fisk <a...@littleshoot.org>.

Fair enough, Jonathan. I actually just tracked it down -- the problem
was between my keyboard and my chair as usual. I neglected to comment
out the following call on my SlicePredicate:

sp.setColumn_names(Arrays.asList(toBytes("my_column_name")));

That call made sense for columns, but not for SuperColumns of course,
and that was the culprit. Ideally the various slice calls would do a
quick check to make sure both the SlicePredicate and the ColumnParent
are either both referring to SuperColumns or are both referring to
Columns and fail fast if not, but now I'm just nitpicking.

Working great now - thanks guys.

-Adam


On Tue, Dec 15, 2009 at 7:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> I can't tell without seeing your CF definitions and code where you are
> not passing a UUID where cassandra expects one :)
>
> On Tue, Dec 15, 2009 at 9:35 PM, Adam Fisk <a...@littleshoot.org> wrote:
>> This is still baffling me a bit, Jonathan. I'm running from a trunk
>> snapshot from a few days ago (2009-12-07).
>>
>> I'm creating time UUIDs using "JUG" (http://jug.safehaus.org/) as my
>> SuperColumn key names for batch_insert calls. That seems excessive but
>> works - the only Java lib I found for creating time-based UUIDs. I'm
>> successfully calling batch_insert using those UUID bytes as my
>> SuperColumn names. I think this is correct, but please correct me if
>> not.
>>
>> Then, however, I can't figure out how to get any more than data for a
>> single SuperColumn from get_slice. It seems the way to do this would
>> be to specify a ColumnParent that only contains a ColumnFamily. When I
>> do this, however, I get the following exception:
>>
>> InvalidRequestException(why:UUIDs must be exactly 16 bytes)
>>        at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandra.java:3170)
>>        at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandra.java:170)
>>        at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.java:144)
>>
>> If I specify a SuperColumn name in the ColumnParent, I can get Columns
>> for just that SuperColumn, but I'm attempting to get a slice of
>> SuperColumns. Am I misunderstanding something? Is that a bug?
>>
>> Thanks very much.
>>
>> -Adam
>>
>>
>> On Wed, Nov 18, 2009 at 8:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> The easiest is to store the messages in a row with timeuuid column
>>> names.  Then you can just use get_slice in either forward or reverse
>>> order.
>>>
>>> On Wed, Nov 18, 2009 at 6:07 PM, Adam Fisk <a...@littleshoot.org> wrote:
>>>> First off, very impressive project -- thanks for everyone's hard work!
>>>> I'm wondering how I would do date range queries in Cassandra, say for
>>>> all messages for a given user in the last week.
>>>>
>>>> Can someone provide an example?
>>>>
>>>> Thanks so much.
>>>>
>>>> -Adam
>>>>
>>>> --
>>>> Adam Fisk
>>>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>>>> http://twitter.com/adamfisk
>>>>
>>>
>>
>>
>>
>> --
>> Adam Fisk
>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>> http://twitter.com/adamfisk
>>
>



-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk

Re: date range queries

Posted by Jonathan Ellis <jb...@gmail.com>.

I can't tell without seeing your CF definitions and code where you are
not passing a UUID where cassandra expects one :)

On Tue, Dec 15, 2009 at 9:35 PM, Adam Fisk <a...@littleshoot.org> wrote:
> This is still baffling me a bit, Jonathan. I'm running from a trunk
> snapshot from a few days ago (2009-12-07).
>
> I'm creating time UUIDs using "JUG" (http://jug.safehaus.org/) as my
> SuperColumn key names for batch_insert calls. That seems excessive but
> works - the only Java lib I found for creating time-based UUIDs. I'm
> successfully calling batch_insert using those UUID bytes as my
> SuperColumn names. I think this is correct, but please correct me if
> not.
>
> Then, however, I can't figure out how to get any more than data for a
> single SuperColumn from get_slice. It seems the way to do this would
> be to specify a ColumnParent that only contains a ColumnFamily. When I
> do this, however, I get the following exception:
>
> InvalidRequestException(why:UUIDs must be exactly 16 bytes)
>        at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandra.java:3170)
>        at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandra.java:170)
>        at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.java:144)
>
> If I specify a SuperColumn name in the ColumnParent, I can get Columns
> for just that SuperColumn, but I'm attempting to get a slice of
> SuperColumns. Am I misunderstanding something? Is that a bug?
>
> Thanks very much.
>
> -Adam
>
>
> On Wed, Nov 18, 2009 at 8:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> The easiest is to store the messages in a row with timeuuid column
>> names.  Then you can just use get_slice in either forward or reverse
>> order.
>>
>> On Wed, Nov 18, 2009 at 6:07 PM, Adam Fisk <a...@littleshoot.org> wrote:
>>> First off, very impressive project -- thanks for everyone's hard work!
>>> I'm wondering how I would do date range queries in Cassandra, say for
>>> all messages for a given user in the last week.
>>>
>>> Can someone provide an example?
>>>
>>> Thanks so much.
>>>
>>> -Adam
>>>
>>> --
>>> Adam Fisk
>>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>>> http://twitter.com/adamfisk
>>>
>>
>
>
>
> --
> Adam Fisk
> http://www.littleshoot.org | http://adamfisk.wordpress.com |
> http://twitter.com/adamfisk
>

Re: date range queries

Posted by Adam Fisk <a...@littleshoot.org>.

Hi Tatu- Thanks for the response. I don't think that's it because it's
successfully inserting them using those UUIDs, and I'm using the JUG
toByteArray call. Good point on the Ethernet access in Java 1.6 --
I'll probably just write my own function.

I'll post some more useful code in response to Jonathon.

-Adam


On Tue, Dec 15, 2009 at 11:17 PM, Tatu Saloranta <ts...@gmail.com> wrote:
> On Tue, Dec 15, 2009 at 7:35 PM, Adam Fisk <a...@littleshoot.org> wrote:
>> This is still baffling me a bit, Jonathan. I'm running from a trunk
>> snapshot from a few days ago (2009-12-07).
>>
>> I'm creating time UUIDs using "JUG" (http://jug.safehaus.org/) as my
>> SuperColumn key names for batch_insert calls. That seems excessive but
>> works - the only Java lib I found for creating time-based UUIDs. I'm
>> successfully calling batch_insert using those UUID bytes as my
>> SuperColumn names. I think this is correct, but please correct me if
>> not.
>>
>> Then, however, I can't figure out how to get any more than data for a
>> single SuperColumn from get_slice. It seems the way to do this would
>> be to specify a ColumnParent that only contains a ColumnFamily. When I
>> do this, however, I get the following exception:
>>
>> InvalidRequestException(why:UUIDs must be exactly 16 bytes)
>>        at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandra.java:3170)
>>        at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandra.java:170)
>>        at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.java:144)
>>
>
> Perhaps you are converting UUIDs to String representation (36 ascii
> chars == 36 bytes), instead of binary (byte[16])? JUG can give out
> both, it's a rather simple lib (and yes, that JDK variant only dishes
> out random number based ones is bizarre, but partly since Ethernet
> address was only recently exposed in JDK 1.6 or so)
>
> -+ Tatu +-
>



-- 
Adam Fisk
http://www.littleshoot.org | http://adamfisk.wordpress.com |
http://twitter.com/adamfisk

Re: date range queries

Posted by Tatu Saloranta <ts...@gmail.com>.

On Tue, Dec 15, 2009 at 7:35 PM, Adam Fisk <a...@littleshoot.org> wrote:
> This is still baffling me a bit, Jonathan. I'm running from a trunk
> snapshot from a few days ago (2009-12-07).
>
> I'm creating time UUIDs using "JUG" (http://jug.safehaus.org/) as my
> SuperColumn key names for batch_insert calls. That seems excessive but
> works - the only Java lib I found for creating time-based UUIDs. I'm
> successfully calling batch_insert using those UUID bytes as my
> SuperColumn names. I think this is correct, but please correct me if
> not.
>
> Then, however, I can't figure out how to get any more than data for a
> single SuperColumn from get_slice. It seems the way to do this would
> be to specify a ColumnParent that only contains a ColumnFamily. When I
> do this, however, I get the following exception:
>
> InvalidRequestException(why:UUIDs must be exactly 16 bytes)
>        at org.apache.cassandra.service.Cassandra$get_slice_result.read(Cassandra.java:3170)
>        at org.apache.cassandra.service.Cassandra$Client.recv_get_slice(Cassandra.java:170)
>        at org.apache.cassandra.service.Cassandra$Client.get_slice(Cassandra.java:144)
>

Perhaps you are converting UUIDs to String representation (36 ascii
chars == 36 bytes), instead of binary (byte[16])? JUG can give out
both, it's a rather simple lib (and yes, that JDK variant only dishes
out random number based ones is bizarre, but partly since Ethernet
address was only recently exposed in JDK 1.6 or so)

-+ Tatu +-