You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Lerer (JIRA)" <ji...@apache.org> on 2014/12/02 14:40:16 UTC

[jira] [Commented] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

    [ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231477#comment-14231477 ] 

Benjamin Lerer commented on CASSANDRA-4476:
-------------------------------------------

{quote}I see it as a trade-off between code complexity and query performance. As Sylvain explained in his earlier comment more than one indexed column means ALLOW FILTERING, for which all bets are off in terms of performance anyway.{quote}

 In the query {{Select * from myTable where a > 1 and a < 3}} there is only one indexed column {{a}} and as such this query does not need filtering and the performance should be predictable.

{quote}While it is good to strive and deliver the optimal performance altogether I think the use case you are describing is rare.{quote}

It is common use case. It is used a lot with time series data for example. When people want to analyse what happened for a range of dates.

{quote}Jonathan Ellis described “When Not to Use Secondary Indexes” in a blog post Do not use secondary indexes to query a huge volume of records for a small number of results{quote}

The statement of Jonathan is true but it has nothing to do with the ability to perform range query on an index. It is about choosing the right tool to query data based on your data distribution.

{quote} so for the proper use of indexed queries this shouldn't have a significant effect but it would make the code more complex.{quote}
Actually, if you think about it you will realize that it can have a big impact.
 

> Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Sylvain Lebresne
>            Assignee: Oded Peer
>            Priority: Minor
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch
>
>
> Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided.
> As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate > 'Jan 2009' AND birtdate < 'July 2009', you'd better use the birthdate index first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)