You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict Elliott Smith (Jira)" <ji...@apache.org> on 2020/01/16 21:16:00 UTC
[jira] [Commented] (CASSANDRA-15510) BTree: Improve Building, Inserting and Transforming

    [ https://issues.apache.org/jira/browse/CASSANDRA-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017490#comment-17017490 ] 

Benedict Elliott Smith commented on CASSANDRA-15510:
----------------------------------------------------

[3.0|http://github.com/belliottsmith/cassandra/tree/15510]

> BTree: Improve Building, Inserting and Transforming
> ---------------------------------------------------
>
>                 Key: CASSANDRA-15510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15510
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Other
>            Reporter: Benedict Elliott Smith
>            Assignee: Benedict Elliott Smith
>            Priority: Normal
>
> This work was originally undertaken as a follow-up to CASSANDRA-15367 to ensure performance is strictly improved, but it may no longer be needed for that purpose.  It’s still hugely impactful, however.  It remains to be decided where this should land.
> The current {{BTree}} implementation is suboptimal in a number of ways, with very little focus having been given to its performance besides its memory-occupancy.  This patch aims to address that, specifically improving the performance and allocations involved in: building, transforming and inserting into a tree.
> To facilitate this work, the {{BTree}} definition is modified slightly, so that we can perform some simple arithmetic on tree sizes.  Specifically, trees of depth n are defined to have a maximum capacity of {{branchFactor^n - 1}}, which translates into capping the number of leaf children at {{branchFactor-1}}, as opposed to {{branchFactor}}.  Since {{branchFactor}} is a power of 2, this permits fast tree size arithmetic, enabling some of these changes.
> h2. Building
> The static build method has been modified to utilise dedicated {{buildPerfect}} methods that build either perfectly dense or perfectly sparse sub-trees.  These perfect trees all share their {{sizeMap}} with each other, and can be built more efficiently than trees of arbitrary size.  The specifics are described in detail in the comments, but this building block can be used to construct trees of any size, using at most one child at each level that is not either perfectly sparse or perfectly dense.  Bulk methods are used where possible.
> For large trees this can produce up to 30x throughput improvement and 30% allocation reduction vs 3.0 (TBC, and to be tested vs 4.0).
> {{FastBuilder}} is introduced for building a tree in-order (or in reverse) without duplicate elements to resolve, without necessarily knowing the size upfront.  This meets the needs of most use cases.  Data is built directly into nodes, with up to one already-constructed node, and one partially constructed node, on each level, being mutated to share their contents in the event of insufficient data to populate the tree.  These builders are thread-locally shared.  These leads to minimal copying, the same sharing of {{sizeMap}} as above, zero wasted allocations, and results in minimal difference in performance between utilising the less-ergonomic static build and builder approach.
> For large trees this leads to ~4.5x throughput improvement, and 70% reduction in allocations vs a normal Builder.  For small trees performance is comparable, but allocations similarly reduced.
> h2. Inserting
> It turns out that we only ever insert another tree into a tree, so we exploit this to implement an efficient union of two trees, operating on them directly via stacks in the transformer, instead of via a collection interface.  A builder-like object is introduced that shares functionality with {{FastBuilder}}, and permits us to build the result of the union directly into the final nodes, reusing as much of the original trees as possible.  Bulk methods are used where possible.
> The result is not _uniformly_ faster, but is _significantly_ faster on average: median _improvement_ of 1.4x (that is, 2.4x total throughput), mean improvement of 10x.  Worst reduction is 30%, and it may be that we can isolate and alleviate that.  Allocations are also reduced significantly, with a median of 30% and mean of 42% for the tested workloads.  As the trees get larger the improvement drops, but remains uniformly lower.
> h2. Transforming
> Transformations garbage overhead is minimal, i.e. the main allocations are those necessary to represent the new tree.  It is significantly faster and particularly more efficient when removing elements, utilising the shared functionality of the builder and transformer objects to define an efficient builder that reuses as much of the original tree as possible. 
> We also introduce dedicated {{transform}} methods (that forbid returning {{null}}), and {{BiFunction}} transformations to permit efficient follow-ups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org