You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2013/12/12 00:16:07 UTC

[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

    [ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845819#comment-13845819 ] 

Jonathan Ellis commented on CASSANDRA-6477:
-------------------------------------------

The most straightforward approach is to take a similar approach to our local indexes:

# At insert/update time, add a new index entry (as part of an atomic batch with the original update]), with the timestamp of the data cell
# At read time, fetch the rows indicated by the index and remove stale index entries.  Since we delete with the same timestamp as the index entry, this is safe wrt concurrent updates
# We can still use compaction of the base table to clean out stale records, but this will now generate updates or hints to the index partition

The big drawback is that reads require an O(N) multiget in the coordinator: reading the index entries is a single request, but then each row to fetch may be on a different replica.

Put another way, this will give us indexes that are good at very high cardinality -- ideally a single row for each indexed value -- to go with our existing low-cardinality indexes, but we still have a hole for "medium cardinality" data.

> Partitioned indexes
> -------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)