You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2018/08/06 11:09:00 UTC
[jira] [Commented] (LUCENE-8438) RAMDirectory speed improvements and cleanup

    [ https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570052#comment-16570052 ] 

Robert Muir commented on LUCENE-8438:
-------------------------------------

I do think the new code looks pretty clean and I like the additional checks in the code (which was desperately needed), but i have some concerns about the timing. How to reduce the risks here release-wise? 

I'm pretty much against committing this to trunk and then immediately trying to start spinning up lucene 8.0. The problem is this code has its tentacles in everything: a bug in this thing will impact far more than just windows users who can't use mmap over tmpfs :) Core codecs etc are using little ramoutputstreams here and there for various crap. 

We need a strategy to reduce the risks here for so many changes to o.a.l.store code. And we should honestly discuss whether the tradeoffs are the right ones. For Lucene 8 which Adrien wants to work on soon, i would rather us just tell users to use mmap over tmpfs and not have corruption. 


> RAMDirectory speed improvements and cleanup
> -------------------------------------------
>
>                 Key: LUCENE-8438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8438
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: capture-4.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and even if we discourage its use in favor of native (mmapped) buffers, there seem to be benefits of keeping RAMDirectory available (quick throw-away indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The implementation is also open for all sorts of abuses – the streams can be reset and are used all around the place as temporary buffers, even without the presence of RAMDirectory itself. This complicates the implementation and is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, consider this PoC pseudo-benchmark. It creates a single monolithic segment with 500K very short documents (single field, with norms). The index is ~60MB once created. We then run semi-complex Boolean queries on top of that index from N concurrent threads. The attached capture-4 shows the result (queries per second over 5-second spans) for a varying number of concurrent threads on an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 16 hyper-threaded). That red line at the bottom (which drops compared to a single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an alternative implementation I wrote that uses ByteBuffers. Yes, it's slower than the native mmapped implementation, but a *lot* faster then the current RAMDirectory (and more GC-friendly because it uses dynamic progressive block scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org