You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Liu (JIRA)" <ji...@apache.org> on 2013/10/17 08:02:44 UTC

[jira] [Commented] (CASSANDRA-6048) CQL3 data filtering improvement

    [ https://issues.apache.org/jira/browse/CASSANDRA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797639#comment-13797639 ] 

Alex Liu commented on CASSANDRA-6048:
-------------------------------------

The join algorithm is as followings
  1. use the least mean columns index as primary index.
   2. Because the index CF columns are sorted, we can move the iterator among the indexes.
e.g.
{code}
    1. let's assume primary index A has index values[composite_name1, composite_name2, composite_name3,
       composite_name6] for column_1, it has iterator_a.
    2. another index B has index values[composite_name2, composite_name_4, composite_name5] for column_1,
        it has iterator_b.
    3. first move iterator_a to composite_name1.
    4. move iterator_b from composite_name2 which is larger than composite_name1, so we move iterator_a to 
        composite_name_2. It matches, so return composite_name2
    5. next moves iterator_b to composite_name_4 which is larger than composite_name2, so we move iterator_a to  
       composite_name3, then to composite_name4. It matches, so return composite_name4.
    6. next move iterator_b to composite_name5 which is larger than composite_name4, so we move iterator_a to
       composite_name6 which is larger than composite_name4, so we move iterator_b to composite_name5.
       It's less thane composite_name6, we need move iterator_b more, but there is no more data, so return end of data
{code}


> CQL3 data filtering improvement
> -------------------------------
>
>                 Key: CASSANDRA-6048
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6048
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>         Attachments: 6048-1.2-branch.txt
>
>
> Existing data filtering uses the following algorithm
> {code}
>    1. find best selective predicate based on the smallest mean columns count
>    2. fetch rows for the best selective predicate predicate, then filter the data based on other predicates left.
> {code}
> So potentially we could improve the performance by
> {code}
>    1.  joining multiple predicates then do the data filtering for other predicates.
>    2.  fine tune the best predicate selection algorithm
> {code}
> For multiple predicate join, it could improve performance if one predicate has many entries and another predicate has a very few of entries. It means a few index CF read, join the row keys, fetch rows then filter other predicates
> Another approach is to have index on multiple columns.



--
This message was sent by Atlassian JIRA
(v6.1#6144)