You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2012/02/10 06:19:59 UTC

[jira] [Created] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Support multiple ranges in SliceQueryFilter
-------------------------------------------

                 Key: CASSANDRA-3885
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
             Project: Cassandra
          Issue Type: Sub-task
            Reporter: Jonathan Ellis




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288972#comment-13288972 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

Thanks for reviewing and for your input Vijay. 

Now I understand what you were saying, but cacheing is used in a different circumstance. In fact the IndexedReader assumes that the query has already been adequately rewritten and that ranges are in order (either forward or reverse) and non-overlapping.

Cacheing is used in the following, very particular, circumstance, lets say we the following columns: a, b, c, d, e, f and that index segments cover the columns as such i1=[a-c] i2=[e-f] and that we want the following (reversed) ranges [f-e] [d-b]. We start by looking for the index segment for "f" which falls into i2 and we'll deserialize columns d through f. Now while the the current range we're looking for only needs "e" and "f" we have also deserialized "d". Had we not cached "d" we would have to re-read the "d" to handle the next range (now [d-b]), but because we cached it we don't. In general if the next range requires any of the data we've already read we're sure to have it in cache and won't need to re-read, if it doesn't we clear the cache.

As you can see it's a very particular optimization for indexed, reverse reads.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-3885:
-----------------------------------------

    Assignee:     (was: Todd Nine)
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292618#comment-13292618 ] 

Vijay commented on CASSANDRA-3885:
----------------------------------

SliceQueryFilter.serializer needs to deal with Multiple ColumnSlice's or at-least we have to create a ticket to followup.

I am confused with the patch a bit... Comments on IndexedSliceReader says
{quote}
forward: [a,b],[d,e],[g,h] reverse: [h,g],[e,d],[b,a].
{quote}

but why do we need to do this?

{code}
            if (reversed && readBackwards)
                currentSlice--;
            else
                currentSlice++;
{code}

In IndexedSliceReader itself is a iterator and within which there is Simple and IndexedFetcher which is also kind of a iterator ... it might be better to make IndexedSliceReader as abstract class and add functionality into classes. 
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3885:
----------------------------------------

    Attachment: 3885-v2.txt

Patch needed rebase. Rebased version attached. I've also pushed it at https://github.com/pcmanus/cassandra/tree/3885.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287594#comment-13287594 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

Hi Vijay

 Sorry for the delay also, somehow I didn't get notified that the issue was updated/commented.
 1) - With regard to SliceRange you're right, I just wanted to get some feedback before changing thrift stuff
 2) - We can definitely avoid that, maybe make IndexedSliceReader receive the ranges as an array?
 3) - Hadn't thought about that... maybe we could have a internal (within org.apache.cassandra.db) version and transform?
 4) - Cacheing the results is not about query validation, at least I can't see how better query validation would discard the necessity of cacheing to avoid re-reads. There is always a scenario where cacheing the results avoids re-reads (when we're sure that the next reversed range will need them), the thing is that right now we're cacheing without knowing that they wether they will be necessary (vs making sure the previous range will need them). In any case the maximum that gets cached is an index range of records per query so that might not be a terrible issue. What do you think?
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-3885:
-----------------------------------------

    Assignee: Todd Nine
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Todd Nine
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SQF, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3885:
--------------------------------------

    Reviewer: vijay2win@yahoo.com
    Assignee: David Alves  (was: Vijay)
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289220#comment-13289220 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

Thank you for your input Sylvain

That crossed my mind. 

The thing is we have to deserialize "d" anyway. So the question becomes what's faster/more optimized, keep comparing every column to see if it matches the previous range (the next range to be processed in reverse) doing two bb comparisons and then store it for later if yes (because we're not finished with the current range) or simply storing anyway it in the case it is needed.  

I figured the latter option would probably be faster, but I might be wrong.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289511#comment-13289511 ] 

Jonathan Ellis commented on CASSANDRA-3885:
-------------------------------------------

While I'm sympathetic to the time spent getting code working, one of the reasons we do code reviews is we have a strong bias towards doing things "right" as opposed to "good enough."  Granted, that can be taken too far, but I'm pretty happy with the balance we've achieved between improving what we commit on the one hand, and bikeshedding on the other.

In short, I'd like to see Sylvain's remaining concerns addressed.  Of course, reviewers aren't infallible either, and getting rid of the cache (for instance) may not actually produce much simplification, but in the interest of achieving consensus it's worth spending the time to try a second approach.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves updated CASSANDRA-3885:
-----------------------------------

    Attachment: CASSANDRA-3885.patch

much simple and readable patch (imo).

- adds a column slice object (gets rid of the nasty unchecked cast warnings)
- gets rid of the isColumnNeeded method and diminishes the number of bb comparisons


                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3885:
--------------------------------------

    Component/s: Core
    Description: 
This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.

We need to support multiple ranges in a SQF, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.

Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.

This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SQF, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289250#comment-13289250 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

The new version of the code has one different in regard to the blockColumns queue, that's that it never adds values to it that aren't going to be necessary (as as done previously) so the blockColumns queue is not really a cache at all, all values there are returned (if computNext() is called, of course) that's why the "cache" is needed only for a (partial) index segment size and only in reverse. In fact I can add couple of bb checks and it is no longer a cache and simply stuff that hasn't been added to the blockColumns queue yet.

Do you think it's important to change the code to mimic how it behaved before?
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves updated CASSANDRA-3885:
-----------------------------------

    Attachment: CASSANDRA-3885.patch

corrected a bug where in reverse the index was being looked up based on the start of the slice and not on the finish.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289243#comment-13289243 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

bq. simply storing anyway it in the case it is needed.

Yeah but we kind of do that in the current code already with the queue {{blockColumns}}. {{blockColumns}} is a cache of sorts and that's why I think we shouldn't add a second one. Much like we currently do, during deserialization we can simply check if it's worth deserializing the next column of the block (which should only mean comparing each column once, against the "biggest" range intersecting the index block). Then, in computeNext(), when polling the column from {{blockColumns}} we can validate if the column is really needed (again, since we know the order in which we poll the column, this should only require comparison to one of the range). This should be roughly equivalent to the cache of your patch, but it avoids having both {{blockColumns}} and the cache.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288960#comment-13288960 ] 

Vijay commented on CASSANDRA-3885:
----------------------------------

Hi David,

{quote}
 Cacheing the results is not about query validation, at least I can't see how better query validation would discard the necessity of cacheing to avoid re-reads.
{quote}
We can validate and also rewrite the query, Example: user wants a query from {A:B-A:D, A:D-A:H, A:D-A:G} can be rewritten as {A:B-A:H}... End of the day the user needs a list of sorted columns.

In addition with the new patch: ColumnSlice can be a Pair<T1,T2>, (but I thought we also need count and isReverse in it?)

                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205247#comment-13205247 ] 

Jonathan Ellis commented on CASSANDRA-3885:
-------------------------------------------

(Most of the work will be pushed down into Memtable.getSliceIterator and SSTableSliceIterator.)
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SQF, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292686#comment-13292686 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

clarified, thanks.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395821#comment-13395821 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

Be sure to check against http://git-wip-us.apache.org/repos/asf/cassandra.git, not any other repository. In particular the github mirror very often get behind.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395825#comment-13395825 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

bq. I always thought we have /cassandra/test/data/serialization/x.x if you want to test the older versions.

Ok, I see. So there is different ant targets to run the serialization test against older version. I've personally never ran that. Is there any reason why we don't just always run the serialization test on all versions?

But anyway, for this patch I'll regenerate the message binaries for 1.2 before committing.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289292#comment-13289292 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

On an unrelated note, I disagree with using Pair<ByteBuffer, ByteBuffer> everywhere in the code, as it's not explicit enough (and it's a lot more character than worth it). So I (strongly) think we should define a specific class for this (say ColumnRange). *But*, I'm fine having that class extends {{Pair<ByteBuffer, ByteBuffer>}}, it's only a matter of having an explicit name.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3885:
--------------------------------------

    Description: 
This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.

We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.

Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.

This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

  was:
This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.

We need to support multiple ranges in a SQF, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.

Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.

This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Todd Nine
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289209#comment-13289209 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

bq. Now while the the current range we're looking for only needs "e" and "f" we have also deserialized "d"

That's what looks wrong to me (without having read the patch, I'll admit). It seems that as soon as we deserialize "d", we should realize that we're outside of [f-e] and thus we should start considering [d-b] and as such add the column to the result without a need for caching. Provided the requested range don't overlap (which I agree could be assumed by the sstable reader and validated way upfront), we shouldn't need to deserialize anything twice and we shouldn't need any cache.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289285#comment-13289285 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

It's not so much about necessarily mimicking the current behavior, but that the cache make the code a lot more disymetric between forward and reverse slices, which imo makes it harder to follow and more prone to having bug for reversed but not forward slices. Also (and though that is arguably minor), reusing the {{blockColumns}} queue as the "cache" as is done currently saves the cache allocation.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395944#comment-13395944 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

+1

I reviewed the patch, looks good overall.
I does not add failures other than the previously mentioned SerializationTest (CompactionsTest fails on a different method, for a different reason on my pc, that's expected right?).
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290070#comment-13290070 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

Hi Jonathan.

I'm sorry I wasn't able to explain myself better. It's not that I wouldn't want to change things, it's that the most straightforward to use the same add cols and then recheck approach is to add more checks to the isColumnNeeded method and that is not a good idea since it is pretty convoluted and slow as is (ran some tests and the cache version vs the add the recheck version is about 5-10% faster about 5 on forward reads and 10 on reverse).

Still in retrospective the code surely be improved lot in terms or readability and Sylvain has made a couple of interesting suggestions that I'm trying out, hopefully we'll get the best of both worlds.


                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves updated CASSANDRA-3885:
-----------------------------------

    Attachment: CASSANDRA-3885.patch

Here's a patch that addresses most of the issues:

- A slice range is now specified as a ColumnSlice, a db package immutable object.

- Ranges are passed as arrays and never changed or copied.

The only thing that was not addressed was the cache issue. IMO cache is useful as it prevents disk rereads. It *might* be better to make sure columns are needed, but even that I'm really not sure since making sure would require a series of bytebuffer comparisons, and keeping a single index-lenght (max) of cols is probably not that bad.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395816#comment-13395816 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

bq.  as far as I can see from the 3885 branch there are other changes in there beyond this patch that are more recent than the last change to trunk.

Then you have the wrong version of trunk. The attached 3885-v2.txt patch applies cleanly to current trunk (as of this comment). 
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay reassigned CASSANDRA-3885:
--------------------------------

    Assignee: Vijay
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281330#comment-13281330 ] 

Vijay edited comment on CASSANDRA-3885 at 5/23/12 12:24 AM:
------------------------------------------------------------

Hi David, 
Thanks for the patch, (Needs a rebase and sorry for the delay in responding) 
1) We should not be hand editing generated code/ SliceRange.
2) Can we avoid doing ranges = ranges.toArray(new SliceRange[ranges.size()]); to avoid unnecessary copies
3) we should not create org.apache.cassandra.thrift.SliceRange inside of SliceQueryFilter we might want to keep the dependency of the thrift outside of the storage layer.
4) not sure about the caching the results... We can just validate it in the thrift layer so users will send proper query (this reduces complexity and comparison on reads)?


                
      was (Author: vijay2win@yahoo.com):
    Hi David, 
Thanks for the patch, (Needs a rebase and sorry for the delay in responding) 
1) We should not be hand editing generated code/ SliceRange by hand.
2) Can we avoid doing ranges = ranges.toArray(new SliceRange[ranges.size()]); to avoid unnecessary copies
3) we should not create org.apache.cassandra.thrift.SliceRange inside of SliceQueryFilter we might want to keep the dependency of the thrift outside of the storage layer.
4) not sure about the caching the results... We can just validate it in the thrift layer so users will send proper query (this reduces complexity and comparison on reads)?


                  
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292683#comment-13292683 ] 

Sylvain Lebresne commented on CASSANDRA-3885:
---------------------------------------------

bq. Why don't we make the "count" information reach ISR so that we can stop at the precise point the user query specifies?

Because it's more complex than that. The final result is based on merging (potentially) multiple sstables, so the Nth column in a given sstable is not necessarily the Nth returned column. 
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289250#comment-13289250 ] 

David Alves edited comment on CASSANDRA-3885 at 6/5/12 8:48 AM:
----------------------------------------------------------------

The new version of the code has one differece in regard to the blockColumns queue, that's that it never adds values to it that aren't going to be necessary (as as done previously) so the blockColumns queue is not really a cache at all, all values there are returned (if computNext() is called, of course) that's why the "cache" is needed only for a (partial) index segment size and only in reverse. In fact I can add couple of bb checks and it is no longer a cache and simply stuff that hasn't been added to the blockColumns queue yet.

Do you think it's important to change the code to mimic how it behaved before?
                
      was (Author: dr-alves):
    The new version of the code has one different in regard to the blockColumns queue, that's that it never adds values to it that aren't going to be necessary (as as done previously) so the blockColumns queue is not really a cache at all, all values there are returned (if computNext() is called, of course) that's why the "cache" is needed only for a (partial) index segment size and only in reverse. In fact I can add couple of bb checks and it is no longer a cache and simply stuff that hasn't been added to the blockColumns queue yet.

Do you think it's important to change the code to mimic how it behaved before?
                  
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287594#comment-13287594 ] 

David Alves edited comment on CASSANDRA-3885 at 6/1/12 11:48 PM:
-----------------------------------------------------------------

Hi Vijay

 Sorry for the delay also, somehow I didn't get notified that the issue was updated/commented.
 1) - With regard to SliceRange you're right, I just wanted to get some feedback before changing thrift stuff
 2) - We can definitely avoid that, maybe make IndexedSliceReader receive the ranges as an array?
 3) - Hadn't thought about that... maybe we could have a internal (within org.apache.cassandra.db) version and transform?
 4) - Cacheing the results is not about query validation, at least I can't see how better query validation would discard the necessity of cacheing to avoid re-reads. There is always a scenario where cacheing the results avoids re-reads (when we're sure that the next reversed range will need them), the thing is that right now we're cacheing without knowing wether they will be necessary (vs making sure the previous range will need them). In any case the maximum that gets cached is an index range of records per query so that might not be a terrible issue. What do you think?
                
      was (Author: dr-alves):
    Hi Vijay

 Sorry for the delay also, somehow I didn't get notified that the issue was updated/commented.
 1) - With regard to SliceRange you're right, I just wanted to get some feedback before changing thrift stuff
 2) - We can definitely avoid that, maybe make IndexedSliceReader receive the ranges as an array?
 3) - Hadn't thought about that... maybe we could have a internal (within org.apache.cassandra.db) version and transform?
 4) - Cacheing the results is not about query validation, at least I can't see how better query validation would discard the necessity of cacheing to avoid re-reads. There is always a scenario where cacheing the results avoids re-reads (when we're sure that the next reversed range will need them), the thing is that right now we're cacheing without knowing that they wether they will be necessary (vs making sure the previous range will need them). In any case the maximum that gets cached is an index range of records per query so that might not be a terrible issue. What do you think?
                  
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264336#comment-13264336 ] 

David Alves edited comment on CASSANDRA-3885 at 4/28/12 2:14 PM:
-----------------------------------------------------------------

Here's a patch that implements multi-slice in ISR, it behaves pretty much as expected/suggested, additionally it implementes re-read prevention. to clarify:
       - for forward scans it checks whether the next slice is within the current index segment and continues the scan there if it is.
       - for reverse scans it stores the columns before the current range, in cache. if the next range (which in reverse is before) starts/uses the same index segment the it uses the cache instead of re-reading the columns.

The test also reduces the number of bytebuffer comparisons in computeNext() as all the columns added to blockColumns are within the argument ranges and may be read so there is no need to make range checks.

The patch slightly changes SliceQueryFilter to use range sets but maintains backwards compatibility. Public access to start and finish is changed to be done through accessors.

The patch includes a somewhat (maybe too) verbose test of the new and old behavior.

The slice reader assumes that either the whole ranges are read reversed or not (there are no forward+reversed range mix).

The slice reader assumes that ranges are sorted correctly, e.g. that for forward lookup ranges are in lexicographic order of start elements and that for reverse lookup they are in reverse lexicographic order of finish (reverse start) elements. i.e. forward: [a,b],[d,e],[g,h] reverse: [h,g],[e,d],[b,a]. The reader also assumes that validation has been performed in terms of intervals (no overlapping intervals).

Tests are *mostly* green with the following exception:

[junit] Testsuite: org.apache.cassandra.db.CommitLogTest
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] 
    [junit] Testcase: org.apache.cassandra.db.CommitLogTest:terminated successfully:	Caused an ERROR
    [junit] Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] 
    [junit] 
    [junit] Test org.apache.cassandra.db.CommitLogTest FAILED (timeout)



                
      was (Author: dr-alves):
    Here's a patch that implements multi-slice in ISR, it behaves pretty much as expected/suggested, additionally itimplemented re-read prevention. to clarify:
       - for forward scans it checks whether the next slice is within the current index segment and continues the scan there if it is.
       - for reverse scans it stores the columns before the current range, in cache. if the next range (which in reverse is before) starts/uses the same index segment the it uses the cache instead of re-reading the columns.

The test also reduces the number of bytebuffer comparisons in computeNext() as all the columns added to blockColumns are within the argument ranges and may be read so there is no need to make range checks.

The patch slightly changes SliceQueryFilter to use ranges sets but maintained backwards compatibility. Public access to start and finish is changed to be done through accessors.

The patch includes a somewhat (maybe too) verbose test of the new and old behavior.

The slice reader assumes that either the whole ranges are read reversed or not (there are no forward+reversed range mix).

The slice reader assumes that ranges are sorted correctly, e.g. that for forward lookup ranges are in lexicographic order of start elements and that for reverse lookup they are in reverse lexicographic order of finish (reverse start) elements. i.e. forward: [a,b],[d,e],[g,h] reverse: [h,g],[e,d],[b,a]. The reader also assumes that validation has been performed in terms of intervals (no overlapping intervals).


                  
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395799#comment-13395799 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

I still can't apply this cleanly to trunk. as far as I can see from the 3885 branch there are other changes in there beyond this patch that are more recent than the last change to trunk.

Sylvain do you want me to try and pick that up? (i mean take what you did and make it applicable to trunk)
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395838#comment-13395838 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

right, I can confirm that, sorry for the n00biness.

patch applies cleanly, running tests.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3885:
----------------------------------------

    Attachment:     (was: 3885-v2.txt)
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281330#comment-13281330 ] 

Vijay commented on CASSANDRA-3885:
----------------------------------

Hi David, 
Thanks for the patch, (Needs a rebase and sorry for the delay in responding) 
1) We should not be hand editing generated code/ SliceRange by hand.
2) Can we avoid doing ranges = ranges.toArray(new SliceRange[ranges.size()]); to avoid unnecessary copies
3) we should not create org.apache.cassandra.thrift.SliceRange inside of SliceQueryFilter we might want to keep the dependency of the thrift outside of the storage layer.
4) not sure about the caching the results... We can just validate it in the thrift layer so users will send proper query (this reduces complexity and comparison on reads)?


                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Vijay (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296050#comment-13296050 ] 

Vijay commented on CASSANDRA-3885:
----------------------------------

{quote}
 So I could regenerate the binary messages, but I'm confused on what SerializationsTest is actually testing.
{quote}
I always thought we have /cassandra/test/data/serialization/x.x if you want to test the older versions.

LGTM +1

nit: setStart() method should have a assert which will check if there is only one Slice.
kind of affects:
{code}
// As soon as we'd done our first call, we want to reset the start column if we're paging
        if (isPaging)
            ((SliceQueryFilter)initialFilter()).setStart(ByteBufferUtil.EMPTY_BYTE_BUFFER);
{code}

PS: I was actually doing a parallel work without the prefetch queue and thought of sharing. (https://github.com/Vijay2win/cassandra/commit/31ca6fd9e1bafc1f4d8dfe858929586637bffdef#L3L18)
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292674#comment-13292674 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

bq. but why do we need to do this?

I'm not completely sure I understood what you are asking. If you're asking why the params specify slices in reverse order when reading reversed, it was because I thought it would make sense to specify the ranges we want the first results to come from, first.

If you're asking about this particular piece of code:

{code}
if (reversed && readBackwards)
                currentSlice--;
            else
                currentSlice++;
{code}

lets say we have the following two slices [a,c] [f,j] and, in reverse [j,f] [c,a]. In IndexedBlockFecther we always start with the first slice ([a,c] and [j,f], respectively) and continue from there as we want to make sure we don't fetch more than necessary. In SimpleBlockFecther however we always fetch the whole row so we might as well process [j,f] [c,a] as if it were [a,c] [f,j] and simply add the blocks to the queue in reverse order. that's why we read reverse & backwards.

I hope that clarified it.

Your question reminds me of one another one I had, though. Why don't we make the "count" information reach ISR so that we can stop at the precise point the user query specifies? It seems wasteful to be reading the whole row in SimpleBlockFetcher is the user only requested the first 5 cols. This wouldn't work in reverse reads, of course, but would speed up forward ones, which are probably the most frequent, no?

bq. In IndexedSliceReader itself is a iterator and within which there is Simple and IndexedFetcher which is also kind of a iterator ... it might be better to make IndexedSliceReader as abstract class and add functionality into classes.

I'm not sure I understood what you are suggesting... is it that we move the code that decides whether we need a Simple- or IndexedBlockFetcher out of ISR and make ISR itself the parent abstract class to Simple- or IndexedBlockFetcher? 








                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves updated CASSANDRA-3885:
-----------------------------------

    Attachment: CASSANDRA-3885.patch

Refactored to use Pair<ByteBuffer,ByteBuffer> instead of ColumnSlice.

Thanks for the tip vijay.

                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288975#comment-13288975 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

With regard to count/reverse per slice, I tried to keep it relatively simple. My reasoning was that writing a query that specifies count and order (fwd,rev) per slice would be hard to do clearly, grammar wise, and that most people would mostly global (for all ranges) ordering and counting. That being said I can always rewrite if people think that per slice count and order are important.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289300#comment-13289300 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

Thank you for the thorough review, Sylvain.

My notes are:

With regard to bugs that's why theres' extensive testing. 

With regard to the symmetry of forward and reverse it is indeed broken but because the two operations (IMO) are really not symmetric since columns are always read from left to right. Not adding unnecessary data to blockColumns prevents having to check ranges for every tuple that comes out of it as was done before, which i think is an advantage in most cases but that bears the penalty of making it impossible to add out-of-order values (as was done before because values were always range-checked twice). Moreover the code is now naturally more complex because the operation is more complex.

All in all I think that most of what you're saying makes sense, but since things are working (and commented throughout) I'd prefer to leave it. The whole cache thing is arguable though and I'd be happy to add the bb checks to make sure only relevant data is added to it if that's the consensus.

With regard to the Pair<ByteBuffer,ByteBuffer> I agree, I think I jumped the gun on that. I particularly don't like the the unchecked casts when using arrays. I'll change that as per your suggestion.
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves updated CASSANDRA-3885:
-----------------------------------

    Attachment: CASSANDRA-3885.patch
    
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264336#comment-13264336 ] 

David Alves edited comment on CASSANDRA-3885 at 4/28/12 2:14 PM:
-----------------------------------------------------------------

Here's a patch that implements multi-slice in ISR, it behaves pretty much as expected/suggested, additionally it implements re-read prevention. to clarify:
       - for forward scans it checks whether the next slice is within the current index segment and continues the scan there if it is.
       - for reverse scans it stores the columns before the current range, in cache. if the next range (which in reverse is before) starts/uses the same index segment the it uses the cache instead of re-reading the columns.

The test also reduces the number of bytebuffer comparisons in computeNext() as all the columns added to blockColumns are within the argument ranges and may be read so there is no need to make range checks.

The patch slightly changes SliceQueryFilter to use range sets but maintains backwards compatibility. Public access to start and finish is changed to be done through accessors.

The patch includes a somewhat (maybe too) verbose test of the new and old behavior.

The slice reader assumes that either the whole ranges are read reversed or not (there are no forward+reversed range mix).

The slice reader assumes that ranges are sorted correctly, e.g. that for forward lookup ranges are in lexicographic order of start elements and that for reverse lookup they are in reverse lexicographic order of finish (reverse start) elements. i.e. forward: [a,b],[d,e],[g,h] reverse: [h,g],[e,d],[b,a]. The reader also assumes that validation has been performed in terms of intervals (no overlapping intervals).

Tests are *mostly* green with the following exception:

[junit] Testsuite: org.apache.cassandra.db.CommitLogTest
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] 
    [junit] Testcase: org.apache.cassandra.db.CommitLogTest:terminated successfully:	Caused an ERROR
    [junit] Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] 
    [junit] 
    [junit] Test org.apache.cassandra.db.CommitLogTest FAILED (timeout)



                
      was (Author: dr-alves):
    Here's a patch that implements multi-slice in ISR, it behaves pretty much as expected/suggested, additionally it implementes re-read prevention. to clarify:
       - for forward scans it checks whether the next slice is within the current index segment and continues the scan there if it is.
       - for reverse scans it stores the columns before the current range, in cache. if the next range (which in reverse is before) starts/uses the same index segment the it uses the cache instead of re-reading the columns.

The test also reduces the number of bytebuffer comparisons in computeNext() as all the columns added to blockColumns are within the argument ranges and may be read so there is no need to make range checks.

The patch slightly changes SliceQueryFilter to use range sets but maintains backwards compatibility. Public access to start and finish is changed to be done through accessors.

The patch includes a somewhat (maybe too) verbose test of the new and old behavior.

The slice reader assumes that either the whole ranges are read reversed or not (there are no forward+reversed range mix).

The slice reader assumes that ranges are sorted correctly, e.g. that for forward lookup ranges are in lexicographic order of start elements and that for reverse lookup they are in reverse lexicographic order of finish (reverse start) elements. i.e. forward: [a,b],[d,e],[g,h] reverse: [h,g],[e,d],[b,a]. The reader also assumes that validation has been performed in terms of intervals (no overlapping intervals).

Tests are *mostly* green with the following exception:

[junit] Testsuite: org.apache.cassandra.db.CommitLogTest
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] 
    [junit] Testcase: org.apache.cassandra.db.CommitLogTest:terminated successfully:	Caused an ERROR
    [junit] Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout.
    [junit] 
    [junit] 
    [junit] Test org.apache.cassandra.db.CommitLogTest FAILED (timeout)



                  
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "David Alves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295647#comment-13295647 ] 

David Alves commented on CASSANDRA-3885:
----------------------------------------

thanks for reviewing and extending the patch to the rest of the system Sylvain.

PS: wrt to code style I've made changes to the imports to meet code style, but I was already using tjake's cassandra eclipse profile...
                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3885) Support multiple ranges in SliceQueryFilter

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3885:
----------------------------------------

    Attachment: 3885-v2.txt

Ok, I like the last patch much more and I've convinced myself that this prefetched thing is a good idea with multiple slices.

I believe this can be simpler a tiny bit further though. Typically the {{readBackward}} business is a tad confusing and since only SimpleBlockFetcher really cares about it, it can be moved there. I also think the switch of slice/lookup of index block can be cleaned by doing both together. I'm attaching a v2 that does those things, a few other cleanups, adds a bunch of comments and also adds two optimizations:
* Since we always know the order on which we read columns, we can remember when we've "entered" a slice to avoid a bunch of comparisons until we "leave" it.
* When we switch to the next slice, we must reuse the binary search to locate where that new slice starts rather than going to the next block, otherwise we may end up reading lots of unnecessary blocks.

I'll note that imho, with those optimizations, SimpleSliceReader (not to be confused with SimpleBlockFetcher) isn't really useful anymore, but we probably want to make sure by benchmarking it.

Anyway, outside of IndexedSliceReader, there was a few problems (that are all solved by the attached v2):
* We needed to deal with multiple slices for the memtable iterator too, otherwise we'll end up returning the wrong columns.
* We needed to correctly serialize multiple slices for the inter-node protocol.

I'll note that with this patch, db.SerializationsTest is broken, but that is no mystery: it's trying to read old messages using the current protocol version. So I could regenerate the binary messages, but I'm confused on what SerializationsTest is actually testing. I though it was making sure we don't break backward compatibility. If we regenerate the binary message at each release we're not testing that at all.

ps: @david, it seems your editor is splitting lines where it shouldn't and is reordering imports in a way that don't respect http://wiki.apache.org/cassandra/CodeStyle.

                
> Support multiple ranges in SliceQueryFilter
> -------------------------------------------
>
>                 Key: CASSANDRA-3885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3885
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: David Alves
>             Fix For: 1.2
>
>         Attachments: 3885-v2.txt, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch, CASSANDRA-3885.patch
>
>
> This is logically a subtask of CASSANDRA-2710, but Jira doesn't allow sub-sub-tasks.
> We need to support multiple ranges in a SliceQueryFilter, and we want querying them to be efficient, i.e., one pass through the row to get all of the ranges, rather than one pass per range.
> Supercolumns are irrelevant since the goal is to replace them anyway.  Ignore supercolumn-related code or rip it out, whichever is easier.
> This is ONLY dealing with the storage engine part, not the StorageProxy and Command intra-node messages or the Thrift or CQL client APIs.  Thus, a unit test should be added to ColumnFamilyStoreTest to demonstrate that it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira