You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/08/30 15:12:51 UTC

[jira] [Resolved] (CASSANDRA-5956) Allow filtering on more than 1 clustered component in CQL3

     [ https://issues.apache.org/jira/browse/CASSANDRA-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-5956.
-----------------------------------------

    Resolution: Duplicate

bq. also CASSANDRA-4851 but the issue raised above is beyond the scope of just paging data

Well it's not. Let's not play too much on words, the description of CASSANDRA-4851 is pretty clear that that the goal is to allow slicing over composites (and that "paging" is just one motivation). But if that wasn't clear, let me confirm that this is what CASSANDRA-4851 will be about so there is no reason to have 2 tickets. So closing that one as duplicate.

bq. My example is quite trivial.

As a side note, while the example is trivial in its complexity, I can't really see any benefit in splitting a time into 3 columns like done here, rather than having a single timestamp column (even if CASSANDRA-4851 was implemented).  A single timestamp column would offer a greater precision yet with a smaller storage space. It also makes queries a bit easier to read imo and make it easier to work with in general client side (because drivers know it's a time).

So while I'd agree it's very easy to come up with toy examples where the absence of slicing over composites sounds very limiting, it is my experience that tables with multiple clustering columns are not *that* common in real models, and even when muliple clustering columns are useful, being able to slice in the same query over more than one of those columns is far from always needed. Anyway, just my 2 cents, I agree that this should be fixed ultimately, that's why I crated CASSANDRA-4851 in the first place.

{quote}
// select all metrics of the day from 6:30pm
{noformat}
SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 18 AND minute >= 30
{noformat}
{quote}

I'll note that this query does *not* do what the comment above it pretends it does (it doesn't in SQL, and it would very arguably be a bug if it was in CQL).  That query selects all row of day 20130828 that have *both* their hour component after 18 *and* their minute component after 30. So it does *not* select 7:01pm for instance (which is why I'm strongly leaning towards adding the tuple-like syntax in CASSANDRA-4851).

                
> Allow filtering on more than 1 clustered component in CQL3
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-5956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: DOAN DuyHai
>            Priority: Minor
>
> Right now I am preparing some slides for a talk and tutorial on Cassandra to convince people switching from Thrift to CQL3. However I am facing issues because of the limitation of CQL3 not being able to allow inequality on more than 1 clustered component at a time.
>  My example is quite trivial. Let's consider a table to collect daily metrics
> {code:sql}
> CREATE TABLE daily_metrics
> (
>   day int, // day in YYYYMMDD format
>   hour int, 
>   minute int,
>   second int,
>   metrics blob, 
>   PRIMARY KEY (day, hour, minute, second)
> )
> {code}
>  I should be able to grep all metrics from a range of date
>   // select all metrics from 8:30am to 10am
>  {code:sql}
>  SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 8 AND minute >= 30 and hour <= 10
>  {code}
>  // select all metrics of the day from 6:30pm
>  {code:sql}
>  SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 18 AND minute >= 30 
>  {code}
>  Right now it is just IMPOSSIBLE to do this kind of query with CQL3, which is PITA. We always get the error message
> {quote}
> Bad Request: PRIMARY KEY part minute cannot be restricted (preceding part hour is either not restricted or by a non-EQ relation)
> {quote}
>  Of course the example is trivial and I can just model the timestamp with 1 component by sticking hour, minute and second together. However the limitation is still there and indeed there is no technical limitation to allow such a query, except from some effort in CQL3 parsing and validation.
>  I know that there is already jira [CASSANDRA-4415|https://issues.apache.org/jira/browse/CASSANDRA-4415] which is a really  good idea and also [CASSANDRA-4851|https://issues.apache.org/jira/browse/CASSANDRA-4851] but the issue raised above is *beyond the scope of just paging data*.
>  People are using more and more compound primary keys to model with Cassandra and they should be able to do slice queries with inequality from all compound components.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira