You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sam Tunnicliffe (JIRA)" <ji...@apache.org> on 2015/08/26 12:14:45 UTC
[jira] [Commented] (CASSANDRA-10124) Full support for multi-column and per-row indexes

    [ https://issues.apache.org/jira/browse/CASSANDRA-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712882#comment-14712882 ] 

Sam Tunnicliffe commented on CASSANDRA-10124:
---------------------------------------------

Some proposals for syntax:

h6. Multi column custom index DDL
{code}
CREATE CUSTOM INDEX ON test_table(col1, col2, col3) USING 'com.foo.bar.CustomIndex';
{code} 
When the targets include non-frozen collections, I think in general we should disallow the use of qualifiers, i.e. keys()/entries(). The main reason for including those when creating non-custom indexes is so C* can select the right index implementation, but for CUSTOM indexes that's done explicitly. Another argument is a practical one; when we store that info in schema tables it's always been with assumption of a single target column, so we have never needed to explicitly link the qualifier to a column. Its representation is as a key without a value; in pre-6717 schema in the json string in {{system.schema_columns.index_options}}, and in 3.0 in the native map {{system_schema.indexes.options}}.

h6. Row based custom index DDL
{code}
CREATE CUSTOM INDEX ON test_table USING 'com.foo.bar.CustomIndex';
CREATE CUSTOM INDEX ON test_table() USING 'com.foo.bar.CustomIndex';
CREATE CUSTOM ROW INDEX ON test_table USING 'com.foo.bar.CustomIndex';
{code}
Personally, I prefer the first option for simplicity.

h6. Select syntax

There are 2 parts to this, modifying the relations syntax so to extend beyond the usual {{<identifier> <operator> <term>}} syntax and adding a means to force (or at least suggest) a specific index be used for a particular query. Both are required for true row based indexes, but I think the latter could also be used with single and multicolumn indexes.

For providing an explicit instruction to C* that it should bypass the usual index selection mechanisms and force the use of a particular index I propose adding an optional {{USING INDEX}} to the {{SELECT}} syntax. The change to decouple a query term from a particular column identifier could be as simple as just allowing a standalone string literal in the {{WHERE}} clause, or if we want to be more explicit we could introduce a pseudo function in the system keyspace along the lines of {{token()}}.

To take one of the examples from the [Stratio docs|https://github.com/Stratio/cassandra-lucene-index]:

{code}
SELECT * FROM tweets WHERE lucene='{
    filter : {type:"range", field:"time", lower:"2014/04/25", upper:"2014/05/1"},
    query  : {type:"phrase", field:"body", value:"big data gives organizations", slop:1}
}' 
{code}

The {{lucene}} field is a dummy column that has to be added to the table in order to associate the pre-3.0 row-based index with the {{tweets}} table. We could rewrite this query as:

{code}
SELECT * FROM tweets 
WHERE expr('{filter : {type:"range", field:"time", lower:"2014/04/25", upper:"2014/05/1"},
             query  : {type:"phrase", field:"body", value:"big data gives organizations", slop:1}}') 
USING INDEX lucene;
{code}

In this version {{lucene}} is the name of the index, rather than a column and the query string itself is wrapped in the {{expr()}} function. 

As mentioned, we could also allow {{USING INDEX}} with the usual relation syntax, for example:
{code}
CREATE TABLE ks.t1(k int, v1 int, v2 int, PRIMARY KEY (k));
CREATE INDEX v1_idx ON ks.t1(v1);
CREATE INDEX v2_idx ON ks.t1(v2);
CREATE CUSTOM INDEX v1_v2_idx ON ks.t1(v1, v2) USING 'com.foo.bar.CustomMultiColumnIndex';
{code}

There are couple of things to consider about querying this table using the indexes:
{code}
SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 ALLOW FILTERING;
{code}

* Even though we have an index which covers both {{v1}} and {{v2}}, there's no guarantee that the query will use it, as the either of the regular indexes may be considered more selective under the current scheme (or even equally selective, in which case the choice is undefined). 
* Even if we *only* have the custom index we still need to {{ALLOW FILTERING}} as the index expressions are only ever considered individually, so if there is > 1 {{ALLOW FILTERING}} is always required.

{code}
SELECT * FROM ks.t1 WHERE v1=0 AND v2=0 USING v1_v2_idx;
{code}

We could allow {{USING INDEX}} both to force the choice of index and to override the requirement for {{ALLOW FILTERING}}. Perhaps the {{ALLOW FILTERING}} override could be dependent on all the restrictions being able to besatisfied by the specified index. Anyway, it seems that something akin to Oracle's [index hinting|http://docs.oracle.com/cd/B19306_01/server.102/b14211/hintsref.htm#CHDJDIAH] might be generally useful for power users and/or during development.

> Full support for multi-column and per-row indexes
> -------------------------------------------------
>
>                 Key: CASSANDRA-10124
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10124
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>              Labels: client-impacting
>             Fix For: 3.0 beta 2
>
>
> Since CASSANDRA-6717 decoupled a secondary index from a single column, we can expand support for indexes with multiple target columns and for row-based indexes with truly dynamic targets.  
> Much of the plumbing for this has been done in CASSANDRA-7771, CASSANDRA-6717 & by the API rework in CASSANDRA-9459. What remains is:
> * Decide on syntax for DDL statements
> * Decide on WHERE clause syntax for SELECT (there is some discussion on this in the comments of CASSANDRA-9459)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)