You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "DOAN DuyHai (JIRA)" <ji...@apache.org> on 2013/08/29 11:08:52 UTC
[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites

    [ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753447#comment-13753447 ] 

DOAN DuyHai commented on CASSANDRA-4851:
----------------------------------------

Big +10 for this feature

Right now I am preparing some slides for a talk and tutorial on Cassandra to convince people switching from Thrift to CQL3. However I am facing serious issue because of the limitation of CQL3 not being able to allow inequality on more than 1 clustered component at a time.

 My example is quite trivial

{code:sql}
 CREATE TABLE comment_index_by_rating
 (
    songId uuid,
    rating int,  // rating is integer from 1 to 10
    date uuid, // date of the comment
    comment text, //comment message itself 
    userLogin text, //login of the user who posts the comment
    PRIMARY KEY (songId,rating,date)
 )
{code}


 I would like to paginate over comment so the first query would be

{code:sql}
 SELECT * FROM comment_index_by_rating WHERE songId = .... ORDER BY rating DESC LIMIT 10; // fetch first 10 comments
{code}

 The following queries would be:

{code:sql}
 SELECT * FROM comment_index_by_rating WHERE songId = .... AND rating <= {rating_of_last_comment_of_previous_batch} AND date <= {date_of_last_comment_of_previous_batch}
{code}

 Right now it is just IMPOSSIBLE to paginate like this, which is PITA.

 I know that there is already jira https://issues.apache.org/jira/browse/CASSANDRA-4415  which is a really good idea but the issue raised above is *beyond the scope of just paging data*.

 People are using more and more compound primary keys to model with Cassandra and they should be able to do slice queries with inequality from all compound components.

 There are lots of use cases where such usage is required

For example indexing daily metrics
{code:sql}
CREATE TABLE daily_metrics
(
  day int, // day in YYYYMMDD format
  hour int, 
  minute int,
  second int,
  metrics blob, 
  PRIMARY KEY (day, hour, minute, second)
)
{code}

 I should be able to grep all metrics from a range of date

  // select all metrics from 8:30am to 10am
 {code:sql}
 SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 8 AND minute >= 30 and hour <= 10
 {code}


                
> CQL3: improve support for paginating over composites
> ----------------------------------------------------
>
>                 Key: CASSANDRA-4851
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> Consider the following table:
> {noformat}
> CREATE TABLE test (
>     k int,
>     c1 int,
>     c2 int,
>     PRIMARY KEY (k, c1, c2)
> )
> {noformat}
> with the following data:
> {noformat}
> k | c1 | c2
> ------------
> 0 | 0  | 0
> 0 | 0  | 1
> 0 | 1  | 0
> 0 | 1  | 1
> {noformat}
> Currently, CQL3 allows to slice over either c1 or c2:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND c1 > 0 AND c1 < 2
> SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 > 0 AND c2 < 2
> {noformat}
> but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 > 0) OR c2 > 0)
> {noformat}
> but we don't support that.
> This can make it hard to paginate over say all records for {{k = 0}} (I'm saying "can" because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do).
> For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered.
> I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others.
> Another solution would be to allow the following syntax:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND (c1, c2) > (0, 0)
> {noformat}
> which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira