You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Branimir Lambov (JIRA)" <ji...@apache.org> on 2015/06/02 18:43:49 UTC
[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

    [ https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569392#comment-14569392 ] 

Branimir Lambov commented on CASSANDRA-8915:
--------------------------------------------

I wasn't completely sure of the correctness of the flagging approach and I wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below ρ(N) stands for parent of N and σ(N) stands for the 'equalParent' predicate, i.e. σ(N) ↔ ρ(N) = N.

We define "ep-heap leading to M" to be a binary tree of nodes where each node N satisfies ρ(N) ≤ N & σ(N) ↔ ρ(N) = N and the root has M as its parent. Full ep-heaps are ep-heaps leading to top, where 'top' is assumed to be a special value strictly smaller than any other (thus σ(R) is always false for the root of a full ep-heap). For simplicity we will take all heaps to end at a special node called 'bottom' whose value is assumed to be greater than any other value.

The critical step here is adding a new value C > M to combine two existing well-formed ep-heaps P and R leading to the same M into a bigger single well-formed ep-heap leading to M. Without loss of generality we can take P ≤ R (otherwise we can swap the places of P and R in the discussion below).

We distinguish three cases:
# C < P (necessarily ¬σ(P)): We place C at the top by setting ρ(P), ρ(R) := C, ρ(C) := M and leaving σ(P), σ(R) := false (Note that σ(R) cannot be true as that contradicts with C > M). σ(C) := false. The ep-heap rooted at C is well-formed leading to M.
# C = P (necessarily ¬σ(P)): We place C at the top, setting ρ(P), ρ(R) := C, ρ(C) := M, σ(P) := true and σ(R) := (P = R). σ(C) := false. The heap rooted at C is well-formed as C = P & σ(P) as well as C = P ≤ R & C = R if and only if σ(R).
# C > P (σ(P) can be true or false): Set ρ(R) := P. If ¬σ(P), set σ(R) := (P = R), otherwise leave σ(R) unchanged. Use the algorithm recursively to turn the two ep-subheaps leading to P at its children and the new value C > P into an ep-heap leading to P. Let its root be Q. The necessary conditions for an element of an ep-heap hold for Q. They are also true for R as P ≤ R and either σ(P) in which case P = M and σ(R) if and only if R = M = P, or ¬σ(P) and σ(R) is explicitly set to be true if and only if P = R. We also have ρ(P) = M as well as P = M if and only if σ(P), thus the constructed ep-heap rooted at P leads to M.

The recursion always ends for finite heaps as we must reach the first case due to the ordering of bottom.

For performance the order of comparing elements and identifying the case to use can be changed to sequentially checking:
- σ(P): apply pt. 3 for P and return (P must be ≤ R). No update of σ(P)/σ(R) necessary.
- σ(R): apply pt. 3 for R and return (R must now be < P). No update of σ(P)/σ(R) necessary.
- compare P and R.
- If R < P, perform the next steps with P and R swapped.
- compare C and P.
- if C < P apply pt. 1.
- if C = P apply pt. 2.
- if C > P apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each element E in the heap E = R if and only if σ is true for E and the chain of parents of E leading to but not including the root. To consume the equal elements we can thus walk the sub-tree formed by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For each of them:
  -- Advance the element. Let the new value be C.
  -- The children P, R of the element must have ¬σ(P), ¬σ(R). Overwrite ρ(P) and ρ(R) with top*. P and R now form ep-heaps leading to top.
  -- Use the algorithm to combine P, R and C into a new ep-heap leading to top.
  -- Let its root be Q. Because the heap leads to top we have ¬σ(Q).
- Continue with the next deepest level of the hierarchy until the top is reached.

\* The algorithm never actually uses the values of the parent function ρ, only the fact that σ is set correctly with respect to that parent. This means this step does not need to be actually performed.
{panel}

Updated [the branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator] to use this solution and cleaned the code a bit.

> Improve MergeIterator performance
> ---------------------------------
>
>                 Key: CASSANDRA-8915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: compaction, performance
>             Fix For: 3.x
>
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a pair of {{poll}}+{{add}} operations for every item in the resulting sequence. This is quite inefficient as {{poll}} necessarily applies at least {{log N}} comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, for example in the case where the inputs largely don't overlap (where {{N}} is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform replacement of the top of the queue in a single step, which will very often complete after a couple of comparisons and in the worst case scenarios will match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)