You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Hobbs (JIRA)" <ji...@apache.org> on 2015/05/06 18:32:02 UTC
[jira] [Reopened] (CASSANDRA-8717) Top-k queries with custom
secondary indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Hobbs reopened CASSANDRA-8717:
------------------------------------
This caused a regression in the [select_distinct_with_deletions dtest|http://cassci.datastax.com/job/trunk_dtest/82/testReport/cql_tests/TestCQL/select_distinct_with_deletions_test/], as confirmed by git bisect. If a fix for this isn't quick, can we revert the commit until it's fixed?
> Top-k queries with custom secondary indexes
> -------------------------------------------
>
> Key: CASSANDRA-8717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Andrés de la Peña
> Assignee: Andrés de la Peña
> Priority: Minor
> Labels: 2i, secondary_index, sort, sorting, top-k
> Fix For: 2.1.6
>
> Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch, 0002-Add-support-for-top-k-queries-in-2i.patch, 0003-Add-support-for-top-k-queries-in-2i.patch, 0004-Add-support-for-top-k-queries-in-2i.patch, 8717-v5.txt
>
>
> As presented in [Cassandra Summit Europe 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be modified to support general top-k queries with minimum changes in Cassandra codebase. This way, custom 2i implementations could provide relevance search, sorting by columns, etc.
> Top-k queries retrieve the k best results for a certain query. That implies querying the k best rows in each token range and then sort them in order to obtain the k globally best rows.
> For doing that, we propose two additional methods in class SecondaryIndexSearcher:
> {code:java}
> public boolean requiresFullScan(List<IndexExpression> clause)
> {
> return false;
> }
> public List<Row> sort(List<IndexExpression> clause, List<Row> rows)
> {
> return rows;
> }
> {code}
> The first one indicates if a query performed in the index requires querying all the nodes in the ring. It is necessary in top-k queries because we do not know which node are the best results. The second method specifies how to sort all the partial node results according to the query.
> Then we add two similar methods to the class AbstractRangeCommand:
> {code:java}
> this.searcher = Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
> public boolean requiresFullScan() {
> return searcher == null ? false : searcher.requiresFullScan(rowFilter);
> }
> public List<Row> combine(List<Row> rows)
> {
> return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, rows));
> }
> {code}
> Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as shown in the attached patch.
> We think that the proposed approach provides very useful functionality with minimum impact in current codebase.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)