You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jay Zhuang (JIRA)" <ji...@apache.org> on 2017/08/07 05:01:00 UTC
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator

    [ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116055#comment-16116055 ] 

Jay Zhuang commented on CASSANDRA-9988:
---------------------------------------

Here are the test results to search target on the different index with the following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
!9988-3tests.png|tests!

Here are my takes from the results:
h4. 1. binary search:
 Pros:
  * best on average and evenly distributed;
  * Simple code, use JAVA native API.

Cons:
  * not optimized to search for the adjacent value.

h4. 2. exponential search:
Pros:
  * optimized to search the first and second value for {{Dir.ASC}}.

Cons:
  * very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places;
  * complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out.

I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated.

h4. 3. binary search optimized for next adjacent value
Pros:
  * Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}}

Cons:
  * Average performance is a litter bit worse than binarySearch

I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |

> Introduce leaf-only iterator
> ----------------------------
>
>                 Key: CASSANDRA-9988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Benedict
>            Assignee: Jay Zhuang
>            Priority: Minor
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 9988-result3.png, 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf page. In this case it _may_ be more efficient to specialise our iterator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org