You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/01/11 10:00:18 UTC

[jira] [Comment Edited] (CASSANDRA-4936) Less than operator when comparing timeuuids behaves as less than equal.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550990#comment-13550990 ] 

Sylvain Lebresne edited comment on CASSANDRA-4936 at 1/11/13 9:00 AM:
----------------------------------------------------------------------

I really think the bug here is that TimeUUIDType.fromString() accepts a date as input. But a date is *not* a valid representation of a timeuuid, and the fromString method does *arbitrarily* pick some 0's for parts of the resulting UUID.

In other words, the SELECT query above should be invalid.

Now don't get me wrong, selecting timeuuid based on dates is useful but that is a slightly different problem. So what I think we should do is:
# refuse dates as valid timeuuid values because they just are not.
# add convenience methods to translate dates to precise timeuuid. For querying we would have 'startOf()' and 'endOf()' (where 'startOf(<date> )' (resp. 'endOf(<date> )') would return the *smallest* (resp. *biggest*) possible timeuuid at time <date> ). And for insertion we could optionally add 'random(<date> )' that would return a random timeuuid at time '<date>' (we could even accept 'now' as syntactic sugar for 'random(now)' if we feel like it).

That would also mean that cqlsh should stop this non-sense of displaying timeuuid like date. Again, I understand the intention of making it more readable but this will confuse generations of CQL3 users. I do am in favor of finding a non confusing way to make it readable for users. In fact one solution could be to handle that on the CQL side and to allow 'SELECT dateOf(x ) FROM ...' that would return a date string from timeuuid x (but now it's clear, you've explicitly asked for a lossy representation of x).

I note that this suggestion pretty much fixes the problem discussed in CASSANDRA-4284 too.

I note that Tyler's solution of basically automatically generating the startOf() and endOf() method under the cover based on whether we've use an inclusive of exclusive operation may appear seductive but I don't think we should do that because:
# if you do that, what about SELECT ... WHERE activity_id = '2012-11-07 18:18:22-0800'. You still have no solution for that and by doing magic under the carpet for < and >, you've in fact blurred what = really does.
# "it would just require passing along information about whether to create the highest or lowest TimeUUID representation for a given datestamp based on the comparison operator that's used" <- while this seem simple on principle, this will yield very *very* ugly special cases internally. This is *not* 2 lines of code.
# more generally, this doesn't solve the fact that date *are not* valid representation of timeuuid. For example, I still think the first point mentioned in CASSANDRA-4284 is a bug in its own right.

Allowing dates as valid representation of timeuuid is a bug, let's fix it.

                
      was (Author: slebresne):
    I really think the bug here is that TimeUUIDType.fromString() accepts a date as input. But a date is *not* a valid representation of a timeuuid, and the fromString method does *arbitrarily* pick some 0's for parts of the resulting UUID.

In other words, the SELECT query above should be invalid.

Now don't get me wrong, selecting timeuuid based on dates is useful but that is a slightly different problem. So what I think we should do is:
# refuse dates as valid timeuuid values because they just are not.
# add convenience methods (say 'startOf()' and 'endOf()') to translate dates to precise timeuuid. For querying we would have 'startOf()' and 'endOf()' (where 'startOf(<date>)' (resp. 'endOf(<date>)') would return the *smallest* (resp. *biggest*) possible timeuuid at time <date>). And for insertion we could optionally add 'random(<date>)' that would return a random timeuuid at time <date> (we could even accept 'now' as syntactic sugar for 'random(now)' if we feel like it).

That would also mean that cqlsh should stop this non-sense of displaying timeuuid like date. Again, I understand the intention of making it more readable but this will confuse generations of CQL3 users. I do am in favor of finding a non confusing way to make it readable for users. In fact one solution could be to handle that on the CQL side and to allow 'SELECT dateOf(x) FROM ...' that would return a date string from timeuuid x (but now it's clear, you've explicitly asked for a lossy representation of x).

I note that this suggestion pretty much fixes the problem discussed in CASSANDRA-4284 too.

I note that Tyler's solution of basically automatically generating the startOf() and endOf() method under the cover based on whether we've use an inclusive of exclusive operation may appear seductive but I don't think we should do that because:
# if you do that, what about SELECT ... WHERE activity_id = '2012-11-07 18:18:22-0800'. You still have no solution for that and by doing magic under the carpet for < and >, you've in fact blurred what = really does.
# "it would just require passing along information about whether to create the highest or lowest TimeUUID representation for a given datestamp based on the comparison operator that's used" <- while this seem simple on principle, this will yield very *very* ugly special cases internally. This is *not* 2 lines of code.
# more generally, this doesn't solve the fact that date *are not* valid representation of timeuuid. For example, I still think the first point mentioned in CASSANDRA-4284 is a bug in its own right.

Allowing dates as valid representation of timeuuid is a bug, let's fix it.

                  
> Less than operator when comparing timeuuids behaves as less than equal.
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4936
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4936
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: Linux CentOS.
> Linux localhost.localdomain 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:37 EDT 2012 i686 i686 i386 GNU/Linux
>            Reporter: Cesar Lopez-Nataren
>             Fix For: 1.2.2
>
>
> If we define the following column family using CQL3:
> CREATE TABLE useractivity (
>   user_id int,
>   activity_id 'TimeUUIDType',
>   data text,
>   PRIMARY KEY (user_id, activity_id)
> );
> Add some values to it.
> And then query it like:
> SELECT * FROM useractivity WHERE user_id = '3' AND activity_id < '2012-11-07 18:18:22-0800' ORDER BY activity_id DESC LIMIT 1;
> the record with timeuuid '2012-11-07 18:18:22-0800' returns in the results.
> According to the documentation, on CQL3 the '<' and '>' operators are strict, meaning not inclusive, so this seems to be a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira