You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Todd Nine (JIRA)" <ji...@apache.org> on 2010/10/10 23:21:31 UTC

[jira] Created: (CASSANDRA-1599) Add paging support for secondary indexing

Add paging support for secondary indexing
-----------------------------------------

                 Key: CASSANDRA-1599
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Todd Nine
             Fix For: 0.7.0


For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.

Page -> start index
           -> end index
           -> page timestamp 
           -> Sort Order


When sorting, is it possible to sort both ASC and DESC? 


            



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1599:
--------------------------------

    Fix Version/s:     (was: 0.7.0)
                   0.8
          Summary: Add sort/order support for secondary indexing  (was: Add paging support for secondary indexing)

Another issue with local indexes is that implementing sorting would involve a clusterwide merge sort. A distributed index is required to efficiently return the data in index order. I think this issue should be delayed for 0.8.0 when we have distributed indexes available: the indexes available in 0.7.0 are intended for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices
Outcomes:
 * Your "primary" index expression would be consistently queried using the "range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions to make this consistent.


> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919700#action_12919700 ] 

Stu Hood edited comment on CASSANDRA-1599 at 10/10/10 11:20 PM:
----------------------------------------------------------------

Another issue with local indexes is that implementing sorting would involve a clusterwide merge sort. A distributed index is required to efficiently return the data in index order. I think this issue should be delayed for 0.8.0 when we have distributed indexes available: the indexes available in 0.7.0 are intended for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices

Outcomes:
 * Your "primary" index expression would be consistently queried using the "range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions to make this consistent.


      was (Author: stuhood):
    Another issue with local indexes is that implementing sorting would involve a clusterwide merge sort. A distributed index is required to efficiently return the data in index order. I think this issue should be delayed for 0.8.0 when we have distributed indexes available: the indexes available in 0.7.0 are intended for filtering data.

As a multi-part solution, (imo) we should:
 # (optionally) Rename local indexes to "filter_indexes" or "filters"
 # Expose 0.8.0 distributed indexes as readonly column families which are sorted by the index value, and which are queried using get_range_slices
 # Implement LT/LTE/GT/GTE operations for the key-range in get_range_slices
Outcomes:
 * Your "primary" index expression would be consistently queried using the "range" parameter in get_range_slices and would define the sort order
 * "filters" (0.7.0 secondary indexes) would be applied using the IndexClause argument as described on CASSANDRA-1600

I'm going to open another ticket to suggest some changes to index definitions to make this consistent.

  
> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919684#action_12919684 ] 

Jonathan Ellis commented on CASSANDRA-1599:
-------------------------------------------

how is this different from IndexClause.start_key?

> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-1599:
--------------------------------

    Component/s: API

> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919695#action_12919695 ] 

Stu Hood commented on CASSANDRA-1599:
-------------------------------------

This ticket should probably by titled "allow sorting by index value", since that is not yet possible, and the paging concerns are not valid until it is implemented.

> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:
----------------------------------------------------------------

Consider a query similar to the following. 


email == 'bob@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses.  While this is not impossible, this becomes very complex for large boolean operand trees.  As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided.  I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented in Cassandra.  

      was (Author: tnine):
    Consider a query similar to the following. 


email == 'bob@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses.  While this is not impossible, this becomes very complex for large boolean operand trees.  As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided.  I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented server side.  
  
> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919687#action_12919687 ] 

Todd Nine edited comment on CASSANDRA-1599 at 10/10/10 7:13 PM:
----------------------------------------------------------------

Consider a query similar to the following. 


email == 'bob@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses.  While this is not impossible, this becomes very complex for large boolean operand trees.  As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided.  I'm not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications once union and intersections are implemented server side.  

      was (Author: tnine):
    Consider a query similar to the following. 


email == 'bob@gmail.com' && (lastlogindate > today - 5 days || newmessagedate > today -1 day). 

Which start key do I advance, one, both?  As a client I would have to iterate over every field in the expression tree to determine what my start key should be for two index clauses.  While this is not impossible, this becomes very complex for large boolean operand trees.  As a user, this functionality would provide a clean interface that abstracts the user from the need to perform an analysis of the previous result set and "diff" it with the expression tree provided.  Not saying it's an absolute must have, but it would certainly provide a lot of appeal to users that are utilizing Cassandra as an eventually consistent storage mechanism for web based applications.
  
> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919916#action_12919916 ] 

Stu Hood edited comment on CASSANDRA-1599 at 10/11/10 2:32 PM:
---------------------------------------------------------------

>> column families which are sorted by the index value
> This is the hard part, I am having trouble wrapping my mind
RandomPartitioner would need to use wide rows for a distributed index, but order preserving partitioners could use skinny rows.

EDIT: I guess I'm also assuming CASSANDRA-1205, where index values can be converted to a byte[] collation key.

> As suggested by "adding a CF index type" I think using the same index query api makes more sense.
The comment above assumes that 1600 is a good idea: we would be using the same index query API: get_range_slices.

      was (Author: stuhood):
    >> column families which are sorted by the index value
> This is the hard part, I am having trouble wrapping my mind
RandomPartitioner would need to use wide rows for a distributed index, but order preserving partitioners could use skinny rows.

> As suggested by "adding a CF index type" I think using the same index query api makes more sense.
The comment above assumes that 1600 is a good idea: we would be using the same index query API: get_range_slices.
  
> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919916#action_12919916 ] 

Stu Hood commented on CASSANDRA-1599:
-------------------------------------

>> column families which are sorted by the index value
> This is the hard part, I am having trouble wrapping my mind
RandomPartitioner would need to use wide rows for a distributed index, but order preserving partitioners could use skinny rows.

> As suggested by "adding a CF index type" I think using the same index query api makes more sense.
The comment above assumes that 1600 is a good idea: we would be using the same index query API: get_range_slices.

> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919908#action_12919908 ] 

Jonathan Ellis commented on CASSANDRA-1599:
-------------------------------------------

This is sort of along the lines of what I was thinking, although I think just adding a ColumnFamily index type would be adequate to distinguish it.

{code}
column families which are sorted by the index value
{code}

This is the hard part, I am having trouble wrapping my mind around a scheme that allows both different sort orders in different CFs and always routes things correctly by node token no matter which CF you are talking about.

{code}
which are queried using get_range_slices
{code}

As suggested by "adding a CF index type" I think using the same index query api makes more sense.

> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add sort/order support for secondary indexing

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919908#action_12919908 ] 

Jonathan Ellis edited comment on CASSANDRA-1599 at 10/11/10 1:13 PM:
---------------------------------------------------------------------

This is sort of along the lines of what I was thinking, although I think just adding a ColumnFamily index type would be adequate to distinguish it.

bq. column families which are sorted by the index value

This is the hard part, I am having trouble wrapping my mind around a scheme that allows both different sort orders in different CFs and always routes things correctly by node token no matter which CF you are talking about.

bq. which are queried using get_range_slices

As suggested by "adding a CF index type" I think using the same index query api makes more sense.

      was (Author: jbellis):
    This is sort of along the lines of what I was thinking, although I think just adding a ColumnFamily index type would be adequate to distinguish it.

{code}
column families which are sorted by the index value
{code}

This is the hard part, I am having trouble wrapping my mind around a scheme that allows both different sort orders in different CFs and always routes things correctly by node token no matter which CF you are talking about.

{code}
which are queried using get_range_slices
{code}

As suggested by "adding a CF index type" I think using the same index query api makes more sense.
  
> Add sort/order support for secondary indexing
> ---------------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.8
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1599) Add paging support for secondary indexing

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919695#action_12919695 ] 

Stu Hood edited comment on CASSANDRA-1599 at 10/10/10 10:44 PM:
----------------------------------------------------------------

This ticket should probably be titled "allow sorting by index value", since that is not yet possible, and the paging concerns are not valid until it is implemented.

      was (Author: stuhood):
    This ticket should probably by titled "allow sorting by index value", since that is not yet possible, and the paging concerns are not valid until it is implemented.
  
> Add paging support for secondary indexing
> -----------------------------------------
>
>                 Key: CASSANDRA-1599
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1599
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Todd Nine
>             Fix For: 0.7.0
>
>
> For a lot of users paging is a standard use case on many web applications.  It would be nice to allow paging as part of a Boolean Expression.
> Page -> start index
>            -> end index
>            -> page timestamp 
>            -> Sort Order
> When sorting, is it possible to sort both ASC and DESC? 
>             

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.