You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremiah Jordan (JIRA)" <ji...@apache.org> on 2014/12/02 20:05:14 UTC
[jira] [Comment Edited] (CASSANDRA-4476) Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)

    [ https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231898#comment-14231898 ] 

Jeremiah Jordan edited comment on CASSANDRA-4476 at 12/2/14 7:04 PM:
---------------------------------------------------------------------

I think you need to re-visit the issue of the result ordering.  Without the full result set being in token order you cannot page through the results from the secondary index.  Internal and user driven paging rely on being able to start the next "page" by knowing the token the previous page ended on.  With an implementation that does not return the results in token order, you cannot send the "end token" of the previous result as the "start token" for the next page, or you will skip all values for following index rows that have a token before that.  For example:

Dataset:
{noformat}
(token(key), indexed)
(1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6)
{noformat}

{noformat}
select token(key),indexed from temp where indexed > 4 limit 3;
3, 5
4, 5
5, 5
{noformat}

Then without proper token order results:

{noformat}
select token(key),indexed from temp where indexed > 4 and token(key) > 5 limit 3;
6, 5
7, 6
8, 6
{noformat}

You just skipped (1, 6) and (2, 6) and can not get them.

The next issue is that the result set merging code relies on the fact that things will be in token order.  So when you run the query at anything higher than ONE and need to merge results from multiple nodes, that code will get screwed up when you transition from (6,5) to (1,6).



was (Author: jjordan):
I think you need to re-visit the issue of the result ordering.  Without the full result set being in token order you cannot page through the results from the secondary index.  Internal and user driven paging rely on being able to start the next "page" by knowing the token the previous page ended on.  With an implementation that does not return the results in token order, you cannot send the "end token" of the previous result as the "start token" for the next page, or you will skip all values for following index rows that have a token before that.  For example:

Dataset:
{noformat}
(token(key), indexed)
(1, 6), (2, 6), (3, 5), (4, 5), (5, 5), (6, 5), (7, 6), (8, 6)
{noformat}

{noformat}
select token(key),indexed from temp where indexed > 4 limit 3;
3, 5
4, 5
5, 5
{noformat}

Then without proper token order results:

{noformat}
select token(key),indexed from temp where indexed > 4 and token(key) > 5 limit 3;
6, 5
7, 6
8, 6
{noformat}

You just skipped (1, 6) and (2, 6) and can not get them.


> Support 2ndary index queries with only inequality clauses (LT, LTE, GT, GTE)
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4476
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API, Core
>            Reporter: Sylvain Lebresne
>            Assignee: Oded Peer
>            Priority: Minor
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 4476-2.patch, 4476-3.patch, cassandra-trunk-4476.patch
>
>
> Currently, a query that uses 2ndary indexes must have at least one EQ clause (on an indexed column). Given that indexed CFs are local (and use LocalPartitioner that order the row by the type of the indexed column), we should extend 2ndary indexes to allow querying indexed columns even when no EQ clause is provided.
> As far as I can tell, the main problem to solve for this is to update KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the selectivity of non-EQ clauses? I note however that if we can do that estimate reasonably accurately, this might provide better performance even for index queries that both EQ and non-EQ clauses, because some non-EQ clauses may have a much better selectivity than EQ ones (say you index both the user country and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate > 'Jan 2009' AND birtdate < 'July 2009', you'd better use the birthdate index first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)