You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jay Zhuang (JIRA)" <ji...@apache.org> on 2017/08/07 05:01:00 UTC
[jira] [Commented] (CASSANDRA-9988) Introduce leaf-only iterator
[ https://issues.apache.org/jira/browse/CASSANDRA-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116055#comment-16116055 ]
Jay Zhuang commented on CASSANDRA-9988:
---------------------------------------
Here are the test results to search target on the different index with the following search algorithms:
1. binary search
2. exponential search
3. binary search optimized for next adjacent value
!9988-3tests.png|tests!
Here are my takes from the results:
h4. 1. binary search:
Pros:
* best on average and evenly distributed;
* Simple code, use JAVA native API.
Cons:
* not optimized to search for the adjacent value.
h4. 2. exponential search:
Pros:
* optimized to search the first and second value for {{Dir.ASC}}.
Cons:
* very bad average performance: compare to binarySearch it has only half the throughput. {{Dir.DESC}} will have the same performance degradation and we do use DESC for some places;
* complicate code, hard to maintain: For example, there's a bug in above code, it should be {{while (prob_ub <= ub && (c = comparator.compare(keys\[probe_ub], key)) < 0)}} it only impacts the tree size {{2^n - 1}} and search for the last element, it's tricky to find out.
I think we could make the algorithm work for {{Dir.DESC}} too, but it will make the code even more complicated.
h4. 3. binary search optimized for next adjacent value
Pros:
* Optimized to search next adjacent value, works for both {{Dir.ASC}} and {{Dir.DESC}}
Cons:
* Average performance is a litter bit worse than binarySearch
I'm not familiar with the whole cassandra code base, so not sure how often we search for the adjacent value. Personally, I would still prefer binarySearch:
* consistency with other cassandra code, plus it's the current behave right now for btree search;
* smaller mean deviations, could be good for tp99;
* clean code.
But I'm fine with any solutions:
| 1. binary search | [9988-trunk|https://github.com/cooldoger/cassandra/tree/9988-trunk] |
| 2. exponential search | [9988-trunk-exp|https://github.com/cooldoger/cassandra/tree/9988-trunk-exp] |
| 3. binary search optimized| [9988-trunk-x|https://github.com/cooldoger/cassandra/tree/9988-trunk-x] |
> Introduce leaf-only iterator
> ----------------------------
>
> Key: CASSANDRA-9988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9988
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Benedict
> Assignee: Jay Zhuang
> Priority: Minor
> Labels: patch
> Fix For: 4.0
>
> Attachments: 9988-3tests.png, 9988-data.png, 9988-result2.png, 9988-result3.png, 9988-result.png, 9988-test-result3.png, 9988-test-result-expsearch.xlsx, 9988-test-result-raw.png, 9988-test-result.xlsx, 9988-trunk-new.txt, 9988-trunk-new-update.txt, trunk-9988.txt
>
>
> In many cases we have small btrees, small enough to fit in a single leaf page. In this case it _may_ be more efficient to specialise our iterator.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org