You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/12/04 09:54:37 UTC

[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only non-EQ clauses

    [ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838743#comment-13838743 ] 

Sylvain Lebresne commented on CASSANDRA-4476:
---------------------------------------------

bq. how much more complicated does CASSANDRA-4511 make this?

Depends what you mean by this. Under the hood, I could be missing something but a priori I don't think CASSANDRA-4511 adds much complexity, if any. But if we want to extend non-EQ clause to collections, we'd need to come up with a syntax to express "where set s has a value greater than 3". But I'd definitively advise leaving that to a follow up ticket, especially because I'm not entirely sure this is generally useful.

A priori, this ticket is not really all that hard. All we need to do is that when we query the index, instead of querying one index row, we support querying a range of them. After that, the rest of the index code should remain unchanged.

Of course, we will need to modify SelectStatement to let queries with no-EQ clause pass validation but that shouldn't be too difficult. As said above, the only remaining question is how to select which index to query when you have multiple indexed columns in the WHERE clause and some of them have non-EQ clauses: how do you estimate which index is likely to be the most selective?  That being said, more than one indexed column means ALLOW FILTERING, for which all bets are off in terms of performance anyway, so for a first version of the patch we could go with a very simplistic heuristic (say, prefer the index with an EQ clause if there is one and if there is none just pick the first index) and leave smarter heuristic for later.


> Support 2ndary index queries with only non-EQ clauses
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.1
>
>
> Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided.
> As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate > 'Jan 2009' AND birtdate < 'July 2009', you'd better use the birthdate index first).



--
This message was sent by Atlassian JIRA
(v6.1#6144)