You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict Elliott Smith (Jira)" <ji...@apache.org> on 2020/01/11 00:39:00 UTC
[jira] [Comment Edited] (CASSANDRA-15389) Minimize BTree iterator allocations

    [ https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013292#comment-17013292 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15389 at 1/11/20 12:38 AM:
------------------------------------------------------------------------------

bq. Good idea about complex data accumulation in collect stats. I ended up adding and using BiLongAccumulator to make all cases garbage free.
 
Sounds great!  Though looking at the code currently posted, I think it’s still capturing a reference to the {{collector}}?  I think you may have simply not posted your latest changes.

bq. This seems like it makes sense, but everything may just be looking like a nail to me, so let me know if you have a better idea
 
My personal view is it's absolutely fine to duplicate code somewhere like this, and I’ve done that in some patches I'll be posting soon.  I think clarity for the reader is also valuable, and if the methods are maintained next to each other and clearly marked as copy/paste variants because we don't have a macro language, I think that's fine and probably preferable.
 
However, I think in this case it's fine either way.  Looking at your implementation I realise it’s possible to simply define {{accept(Consumer<V> c, V v)}} in terms of {{accept(BiConsumer<V1,V2> c, V1 v1, V2 v2)}} by simply invoking {{accept(Consumer::accept, c, v)}}, which should be more than clear enough.

It’s a shame this is a recursive function so we won’t be able to benefit from inlining, and will probably incur an extra virtual invocation cost.  It might be that we anyway win by splitting implementations, but it may not be worth overthinking.
 
Other random thoughts:

* I think it might be helpful to treat {{apply}} to the same restructure you’ve done to {{accumulate}} - which helped me a great deal in reading that.
* {{isStopSentinel}}: do we use {{Long.MIN_VALUE}} anymore?  Should we also consider optionally permitting a stop sentinel to be provided, so we can make {{minDeletionTime}} more efficient?  I don’t think we have anywhere that needs more than one stop sentinel anymore, so I think it’s probably an acceptable complication.
* nit: {{ComplexColumnData.accumulate}} should we call the parameter e.g. {{initialValue}} to distinguish it from those {{accumulate}} methods that take a value to start from in the tree?


was (Author: benedict):
bq. Good idea about complex data accumulation in collect stats. I ended up adding and using BiLongAccumulator to make all cases garbage free.
 
Sounds great!  Though looking at the code currently posted, I think it’s still capturing a reference to the {{collector}}?  I think you may have simply not posted your latest changes.

bq. This seems like it makes sense, but everything may just be looking like a nail to me, so let me know if you have a better idea
 
My personal view is it's absolutely fine to duplicate code somewhere like this, and I’ve done that in some patches I'll be posting soon.  I think clarity for the reader is also valuable, and if the methods are maintained next to each other and clearly marked as copy/paste variants because we don't have a macro language, I think that's fine and probably preferable.
 
However, I think in this case it's fine either way.  Looking at your implementation I realise it’s possible to simply define {{accept(Consumer<V> c, V v)}} in terms of {{accept(BiConsumer<V1,V2> c, V1 v1, V2 v2)}} by simply invoking {{accept(Consumer::accept, c, v)}}, which should be more than clear enough.

It’s a shame this is a recursive function so we won’t be able to benefit from inlining, and will probably incur an extra virtual invocation cost.  It might be that we anyway win by splitting implementations, but it may not be worth overthinking.
 
FWIW, I think it might be helpful to treat {{apply}} to the same restructure you’ve done to {{accumulate}} - which helped me a great deal in reading that.
 
{{isStopSentinel}}: do we use {{Long.MIN_VALUE}} anymore?  Should we also consider optionally permitting a stop sentinel to be provided, so we can make {{minDeletionTime}} more efficient?  I don’t think we have anywhere that needs more than one stop sentinel anymore, so I think it’s probably an acceptable complication.
 
nit: {{ComplexColumnData.accumulate}} should we call the parameter e.g. {{initialValue}} to distinguish it from those {{accumulate}} methods that take a value to start from in the tree?

> Minimize BTree iterator allocations
> -----------------------------------
>
>                 Key: CASSANDRA-15389
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15389
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Local/Compaction
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0
>
>
> Allocations of BTree iterators contribute a lot amount of garbage to the compaction and read paths.
> This patch removes most btree iterator allocations on hot paths by:
>  • using Row#apply where appropriate on frequently called methods (Row#digest, Row#validateData
>  • adding BTree accumulate method. Like the apply method, this method walks the btree with a function that takes and returns a long argument, this eliminates iterator allocations without adding helper object allocations (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, UnfilteredSerializer#serializedRowBodySize) as well as eliminating the allocation of helper objects in places where apply was used previously^[1]^.
>  • Create map of columns in SerializationHeader, this lets us avoid allocating a btree search iterator for each row we serialize.
> These optimizations reduce garbage created during compaction by up to 13.5%
>  
> [1] the memory test does measure memory allocated by lambdas capturing objects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org