You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Petrov (JIRA)" <ji...@apache.org> on 2016/08/01 18:24:20 UTC

[jira] [Comment Edited] (CASSANDRA-11990) Address rows rather than partitions in SASI

    [ https://issues.apache.org/jira/browse/CASSANDRA-11990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399120#comment-15399120 ] 

Alex Petrov edited comment on CASSANDRA-11990 at 8/1/16 6:24 PM:
-----------------------------------------------------------------

During several discussions with [~xedin] we came up with an idea to evaluate the support for different partitioners, since it'd help with wider SASI adoption and remove current limitation of Long tokens. I've evaluated the support, and can conclude that supporting the constant-size tokens can be included into the patch without large overhead. Patch was adjusted accordingly. There are still several failing tests, although they'll be fixed shortly. 

Support for variable-size tokens (for partitioners such as {{ByteOrderedPartitioner}} requires much larger time investment. My personal suggestion is to encode them with the size and avoid on-disk format changes. This will result into more complex iteration process for variable-size tokens, since we'll have to skip tokens depending on the size and won't be able to use simple multiplication for offset calculation. I've made a small patch / proof of concept for variable size tokens by adding `serializedSize` method into the token tree nodes, currently (for sakes of POC and in order to save some time), it was done by reusing the `serialize` function and passing a throwaway byte buffer, and calculating offsets by iterating and reading integers with token size. It worked just fine for simple cases. I'll mention that SASI code is written very well and offset calculation methods are very well isolated. 

Having that said, I'd suggest to leave the "algorithmic" heavy-lifting (variable token offset calculation) for the separate ticket to reduce the scope of current ticket. Since it's not going to require the on-disk format changes, we can safely postpone this work. 


Another thing that's been mentioned was is to include the column offset into clustering offset long. I'll be evaluating this proposal in terms of performance today. It seems that we can avoid increasing the size of {{long[]}} array that hold offsets and this change can help to avoid post-filtering alltogether. Additional optimisation (which, once again, could be left for the follow-up patch) is to avoid the second seek within the data file for cases when we are only querying columns that are indexed. This can be a significant performance improvement, although it'd be good to discuss whether such queries are widely used.

cc [~slebresne] [~iamaleksey] [~jbellis] [~beobal]


was (Author: ifesdjeen):
During several discussions it's been proposed to evaluate the support for different partitioners, since it'd help with wider SASI adoption and remove current limitation of Long tokens. I've evaluated the support, and can conclude that supporting the constant-size tokens can be included into the patch without large overhead. Patch was adjusted accordingly. There are still several failing tests, although they'll be fixed shortly. 

Support for variable-size tokens (for partitioners such as {{ByteOrderedPartitioner}} requires much larger time investment. My personal suggestion is to encode them with the size and avoid on-disk format changes. This will result into more complex iteration process for variable-size tokens, since we'll have to skip tokens depending on the size and won't be able to use simple multiplication for offset calculation. I've made a small patch / proof of concept for variable size tokens by adding `serializedSize` method into the token tree nodes, currently (for sakes of POC and in order to save some time), it was done by reusing the `serialize` function and passing a throwaway byte buffer, and calculating offsets by iterating and reading integers with token size. It worked just fine for simple cases. I'll mention that SASI code is written very well and offset calculation methods are very well isolated. 

Having that said, I'd suggest to leave the "algorithmic" heavy-lifting (variable token offset calculation) for the separate ticket to reduce the scope of current ticket. Since it's not going to require the on-disk format changes, we can safely postpone this work. 


Another thing that's been mentioned was is to include the column offset into clustering offset long. I'll be evaluating this proposal in terms of performance today. It seems that we can avoid increasing the size of {{long[]}} array that hold offsets and this change can help to avoid post-filtering alltogether. Additional optimisation (which, once again, could be left for the follow-up patch) is to avoid the second seek within the data file for cases when we are only querying columns that are indexed. This can be a significant performance improvement, although it'd be good to discuss whether such queries are widely used.

cc [~slebresne] [~iamaleksey] [~jbellis] [~beobal]


> Address rows rather than partitions in SASI
> -------------------------------------------
>
>                 Key: CASSANDRA-11990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11990
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Alex Petrov
>            Assignee: Alex Petrov
>         Attachments: perf.pdf, size_comparison.png
>
>
> Currently, the lookup in SASI index would return the key position of the partition. After the partition lookup, the rows are iterated and the operators are applied in order to filter out ones that do not match.
> bq. TokenTree which accepts variable size keys (such would enable different partitioners, collections support, primary key indexing etc.), 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)