You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/10/11 19:21:33 UTC

[jira] Updated: (CASSANDRA-1601) Refactor index definitions

     [ https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1601:
--------------------------------------

         Priority: Major  (was: Critical)
    Fix Version/s:     (was: 0.7.0)
                   0.8

This is a huge amount of feature creep to jam end at the end of 0.7.  (Nor do I think indexing supercolumn data is even desirable.)  Pushing to 0.8.

> Refactor index definitions
> --------------------------
>
>                 Key: CASSANDRA-1601
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>             Fix For: 0.8
>
>
> h3. Overview
> There are a few considerations for defining secondary indexes and row validation that I don't think have been brought up yet. While the interface is still malleable pre 0.7.0, we should attempt to make changes that allow for forwards compatibility of index/validator schemas. This is an umbrella ticket for suggesting/debating the changes: other tickets should be opened for quick improvements that can be made before 0.7.0.
> ----
> h3. Index output types
> The output (queryable) data from an indexing operation is what actually goes in the index. For a particular row, the output can be either _single-valued_, _multi-valued_ or _compound_:
> * Single-valued
> ** Implemented in trunk (special case of multi-valued)
> * Multi-valued
> ** Multiple index values _of the same type_ can match a single row
> ** Row probably contains a list/set (perhaps in a supercolumn)
> * Compound
> ** Multiple base properties concatenated as one index entry 
> ** Different validators/comparators for each component
> ** (Given the simplicity of performing boolean operations on 1472 indexes, compound local indexes are unlikely to ever be worthwhile, but compound distributed indexes will be: see comments on CASSANDRA-1599)
> h3. Index input types
> The other end of indexing is selection of values from a row to be indexed. Selection can correspond directly to our current {{db.filter.*}} implementations, and may be best implemented by specifying the validator/index using the same Thrift objects you would use for a similar query:
> * Name selection
> ** Implemented in trunk, but should probably just be a special case of list selection below
> ** Corresponds to db.filter.NamesQueryFilter of size 1
> * List selection
> ** Should specify a list of columns of which all values must be of the same type, as defined by the Validator
> ** Corresponds to db.filter.NamesQueryFilter
> * Range (prefix?) selection
> ** Subsets of a row may be interesting for indexing
> ** Range corresponds to db.filter.SliceQueryFilter
> *** (A Prefix might actually be more useful for indexing, but is better implemented by indexing an arbitrarily nested row)
> ** Open question: might the ability to index only the 'top N values' from a row be useful? If so, then this selector should allow N to be specified like it would be for a slice
> h3. Supercolumns/arbitrary-nesting
> Another consideration is that we should be able to support indexing and validation of supercolumns (and hence, arbitrarily nested rows). Since the selection of columns to index is essentially the same as the selection of columns to return for a query, this can probably mirror (and suggest improvements to) our query API.
> h3. UDFs
> This is obviously still an open area, but user defined indexing functions are essentially a transform between the _input_ and _output_ (as defined above), which would normally have equal structures. Leaving room for UDFs in our index definitions makes sense, and will likely lead to a much more general and elegant design.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.