You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alex Petrov (JIRA)" <ji...@apache.org> on 2017/03/07 07:41:33 UTC
[jira] [Comment Edited] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

    [ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898877#comment-15898877 ] 

Alex Petrov edited comment on CASSANDRA-12915 at 3/7/17 7:40 AM:
-----------------------------------------------------------------

Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI Range iterators are taking the shortest range.

For example, if we have records like 

|| a || b || c ||
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |

And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}}, we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}} (for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection will be empty. 

Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100 results and the other just 1), SASI would be able to efficiently intersect ranges and hit the storage to retrieve just a single row. In the case with __empty__ results from one of the iterators, the one you're fixing, we were simply using a single iterator, and had to hit the storage a 100 times (assuming the other iterator returns 100 results). Now, with what I propose, we will hit is 0 times. Same with your approach, although with a lot of complex logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace" that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146] with an actual iterator, to let the intersection know there's a second part, and it's empty. I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412] but I'm happy to re-iterate:

bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237] the "shortest" token tree and start skipping the second token tree based on the results retrieved from the first one. So one of the indexes will be iterated anyways. The second index (since it's larger) might  have to fetch/load roughly 10 blocks (since they might be located far from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many items as the smallest index has. For example, two index queries would skip through (left is which index is used, right is index of the item within the token tree):

and 

bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78] based on differences in cardinality. I'd say that current benchmark shows that the query is slightly slower (since we have to go through around twice as much data on disk). But given the numbers at hand  the difference that is small and it's sub-millisecond, this optimisation seems not to pay the complexity that it brings.

That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate and skip iterators.

Looks like we're mostly on the same page. I've ran (relatively) large scale tests with a variant of the patch I've posted (160+GB of data), and it works exactly as expected.  Now, I want to simplify the patch and make sure we do not add the code we do not need.  

So there's no need to add additional logic to hide empty ranges, it's already handled. 


was (Author: ifesdjeen):
Sure, what I'm saying is that behaviour is unnecessary to fix the problem that we have. SASI Range iterators are taking the shortest range.

For example, if we have records like 

|| a || b || c ||
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |

And {{b}} and {{c}} are SASI-indexed, and we want to run the query {{WHERE b = 2 AND c = 1}}, we will get 2 iterators, one with {{2}} (for the column {{b}}) and the second one with {{1,2,3}} (for the column {{c}}). Now, SASI will take the shortest range ({{b}}) and start iterating by "rewinding" the {{c}} iterator to the token {{2}}. If there's no item for the token, intersection will be empty. 

Now, moving closer to the problem. If we had just imbalanced iterators (one returning 100 results and the other just 1), SASI would be able to efficiently intersect ranges and hit the storage to retrieve just a single row. In the case with __empty__ results from one of the iterators, the one you're fixing, we were simply using a single iterator, and had to hit the storage a 100 times (assuming the other iterator returns 100 results). Now, with what I propose, we will hit is 0 times. Same with your approach, although with a lot of complex logic that I can not justify, so I have asked you to clarify. The trick was simply to "replace" that [null check|https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-25d7f486e2818c56d6b01aa952d459f3L146] with an actual iterator, to let the intersection know there's a second part, and it's empty. I thought it was already discussed in [this comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15728412&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15728412] but I'm happy to re-iterate:

bq. The way RangeIterator work and how it's optimised, we're [picking|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L234-L237] the "shortest" token tree and start skipping the second token tree based on the results retrieved from the first one. So one of the indexes will be iterated anyways. The second index (since it's larger) might  have to fetch/load roughly 10 blocks (since they might be located far from one another on disk), but it never has to fetch all 2M items. It'll iterate only as many items as the smallest index has. For example, two index queries would skip through (left is which index is used, right is index of the item within the token tree):

and 

bq. Moreover, RangeIterator already has [different optimisation strategies|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L76-L78] based on differences in cardinality. I'd say that current benchmark shows that the query is slightly slower (since we have to go through around twice as much data on disk). But given the numbers at hand  the difference that is small and it's sub-millisecond, this optimisation seems not to pay the complexity that it brings.

That's one of the ideas behind SASI, pretty much: to be able to efficiently merge, iterate and skip iterators.

Looks like we're mostly on the same page. I've ran large-scale tests with a variant of the patch I've posted (160+GB of data), and it works exactly as expected.  Now, I want to simplify the patch and make sure we do not add the code we do not need.  

So there's no need to add additional logic to hide empty ranges, it's already handled. 

> SASI: Index intersection with an empty range really inefficient
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-12915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: Corentin Chary
>            Assignee: Corentin Chary
>             Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the intersection (and effectively only use 'index1') the query will run in a few tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection when we know it will be slow. Probably when the range size factor is very small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)