You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/02/11 00:23:13 UTC
[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

    [ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315157#comment-14315157 ] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

I've pushed an updated branch [here|https://github.com/belliottsmith/cassandra/tree/7282-withcomments] that I hope addresses all of your concerns.

bq. LongNonBlockingHashOrderedMapTest times out on default settings for ant long-test on my box

The test is really designed to be run on a machine with at least 8 physical cores, and it may take an excessively long time on fewer. In general it's hard to make a test like this terminate in a predictable time horizon, and the longer it runs the better (ideally we'd just leave it running for days). I've tweaked the settings slightly to make it run better on lower core counts but still actually perform some useful work.

bq. this.index aliasing in predecessor appears redundant

This is necessary to save fetching a different version of the index each time, and incurring the memory access costs with each reference

bq. Would it be worth making maybeResize threshold configurable? What's our reasoning for targeting 66%?

66% seems pretty reasonable (defaults are usually somewhere between 50% and 75% for hash tables). We don't generally expose this kind of internal feature to users, but I am not totally opposed to doing so. I doubt a value much different from this is likely to help much, though; a value of < 50% could mean the index becomes more stale and see we have more scanning forwards to do on lookups, whereas a value of above 100% means we start seeing significant worst case collisions.

bq. in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on the executor to drop duplicate resize calls on the floor? If so, it might be worth documenting.

It doesn't matter if it does or does not; it only ever resizes upwards. We also shouldn't ever have a duplicate resize call, unless we haven't executed by the time the table's contents have doubled.

bq. Why do we bit shift on generation of index node instead of just using integer value?

This is a standard practice for me: shifts in multiples of 10 are powers of ~1000 (i.e. 1024). I find it much neater than lots of "magic constants" when you get above 1024 (e.g. 1048576 or 1073741824) , or snippets of  1024L * 1024L * 1024L. 1 << 30 is much clearer (representing 1024^3). However it's pretty marginal utility when working with small quantities, so I've switched it. I'd prefer to see it used more widely in the codebase though.

I've spent a while trying to comment very clearly the algorithm at the top of the class, with diagrams. Please let me know if it's a success, and if there's anything more I can do to clarify it.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts.
> I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once,  further reducing the constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)