You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Petrov (JIRA)" <ji...@apache.org> on 2016/12/06 19:18:58 UTC

[jira] [Comment Edited] (CASSANDRA-12910) SASI: calculatePrimary() always returns null

    [ https://issues.apache.org/jira/browse/CASSANDRA-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15726419#comment-15726419 ] 

Alex Petrov edited comment on CASSANDRA-12910 at 12/6/16 7:18 PM:
------------------------------------------------------------------

I can see no correlation between filled columns in rows and this patch. 

Let's say there are two sstables: 

{code}
| a | b | c |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |

| a | b | c |
| 4 | 4 | 4 |
| 5 | 5 | 2 |
{code}

With a {{PRIMARY KEY a}} . When querying for {{SELECT * FROM tbl WHERE b = 5 AND c = 2}}. Now, results for the column {{b}} are only in the second sstable. Results for the column {{c}} are both in the first and in second sstable. Since we're doing {{AND}} query, we can conclude that in order to obtain all necessary results, it will be enough to query the second sstable, so we're picking the index on the column {{b}} as primary and instead of using indexes over two sstables, are using indexes for only one sstable, as specified [here|https://github.com/ifesdjeen/cassandra/blob/8a64718d8447029584e24b3a5b75cde70e835dd7/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L208-L212]. 


was (Author: ifesdjeen):
I can see no correlation between filled columns in rows and this patch. 

Let's say there are two sstables: 

{code}
| a | b | c |
| 1  | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |

| a | b | c |
| 4  | 4 | 4 |
| 5 | 5 | 2 |
{code}

With a {{PRIMARY KEY a}} . When querying for {{SELECT * FROM tbl WHERE b = 5 AND c = 2}}. Now, results for the column {{b}} are only in the second sstable. Results for the column {{c}} are both in the first and in second sstable. Since we're doing {{AND}} query, we can conclude that in order to obtain all necessary results, it will be enough to query the second sstable, so we're picking the index on the column {{b}} as primary and instead of using indexes over two sstables, are using indexes for only one sstable, as specified [here|https://github.com/ifesdjeen/cassandra/blob/8a64718d8447029584e24b3a5b75cde70e835dd7/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L208-L212]. 

> SASI: calculatePrimary() always returns null
> --------------------------------------------
>
>                 Key: CASSANDRA-12910
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12910
>             Project: Cassandra
>          Issue Type: Bug
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 0002-sasi-fix-calculatePrimary.patch
>
>
> While investigating performance issues with SASI  (https://github.com/criteo/biggraphite/issues/174 if you want to know more) I ended finding calculatePrimary() in QueryController.java which apparently should return the "primary index".
> It lacks documentation, and I'm unsure what the "primary index" should be, but apparently this function never returns one because primaryIndexes.size() is always 0.
> https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java#L237
> I'm unsure if the proper fix is checking if the collection is empty or reversing the operator (selecting the index with higher cardinality versus the one with lower cardinality).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)